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\  Abstract 

\  - 

\ 

This  dissertation  presents  the  theoretical  development 
and  numerical  implementation  of  a  minimum  cross-entropy  tar¬ 
get  detection  algorithm.  The  procedure  is  based  on  the  so¬ 
lution  of  a  nonlinear  constrained  cross-entropy  minimization 
problem  and  requires  information  in  the  form  of  raw  image 
moments.  The  detection  rule  involves  both  preprocessing  and 
real-time  computations.  The  preprocessing  requires  the  se¬ 
lection  of  a  set  of  target  templates  and  the  solution  of  the 
constrained  cross -entropy  minimization  problem  for  the  se¬ 
lected  target  templates.  The  real-time  processing  requires 
the  computation  of  image  moments  and  a  set  of  dot  product 
operations . 

An  orthonormal  set  of  "information  functions"  is  devel¬ 
oped  and  numerical  methods  of  converting  raw  image  moments 
into  the  expected  values  of  the  information  functions  are 
given.  Numerical  techniques  for  image  moment  computation 
and  a  solution  scheme  for  the  nonlinear  set  of  constraints 
are  developed  and  implemented.  — Tlx^theoretical  development 

of  the  detection  algorithm  is  given  Starting  from  a  set  of 

\ 

consistency  sucioms.  The  expected  performance  is  analyzed 
and  factors  determining  performance  preWent«4-r^  The  procedure 
is  applied  to  a  test  set  of  100  images  and  the  detection  al¬ 
gorithm  error  probability  is  projected  and  related  to  the 
salient  performance  determining  factors. 


INFORMATION  THEORETIC  DETECTION  OF  OBJECTS 
EMBEDDED  IN  CLUTTERED  AERIAL  SCENES 

Chapter  1.  Introduction 

The  general  problem  considered  in  this  dissertation  is 
that  of  characterizing  and  evaluating  the  information  present 
in  the  image  plane  of  an  optical  system.  The  specific  prob¬ 
lem  of  interest  is  the  detection  of  complex  man-made  objects 
in  aerial  scenes  that  contain  confusing  background  informa¬ 
tion  or  optical  clutter.  A  general  overview  of  this  target 
detection  in  clutter  problem  can  be  found  in  Gagnon’s  disser¬ 
tation  (Gagn^,  1975)  while  Harley  et  al  (Harley,  1977)  pro¬ 
vide  an  overview  of  typical  system  parameters  encountered  in 
practice. 

The  image  plane  which  is  the  source  of  information  for 
this  detection  problem  is  usually  a  photograph  which  can  be 
taken  from  any  airborne  vehicle.  The  source  of  information 
or  aerial  photographs  are  classified  as  either  vertical  or 
oblique  aerial  photographs  depending  on  the  angle  of  inclina¬ 
tion  of  the  optical  axis  of  the  lens.  Vertical  photographs 
are  those  taken  with  the  optical  axis  of  the  lens  pointing 
vertically  downward  at  the  time  of  exposure.  Oblique  photo¬ 
graphs  are  those  taken  with  the  optical  axis  intentionally 
deviated  from  the  vertical.  Oblique  photographs  are  further 
classified  as  low  and  high  oblique  based  on  the  magnitude  of 


the  eui  :le  of  deviation.  A  low  oblique  has  a  relatively 
small  or  low  angle  of  deviation  from  vertical  and  does  not 
include  the  apparent  horizon  or  the  visible  junction  of 
earth  and  sky  as  seen  from  the  camera  station.  A  high  o- 
blique. has  a  relatively  large  or  high  angle  of  deviation 
from  the  vertical  and  includes  the  apparent  horizon  (Whitmore, 
1966*  1).  This  dissertation  will  only  characterize  vertical 
photographs  taken  from  a  known  altitude,  however,  the  methods 
used  in  this  work  should  also  characterize  at  least  low  o- 
blique  photographs. 

Figure  1 . 1  shows  some  of  the  geometry  involved  in  gener¬ 
ating  a  vertical  aerial  photograph.  Each  camera  exposure  pro¬ 
duces  a  frame  of  information  that  is  shown  as  a  series  of 
large  non-overlapping  squares  in  the  figure  for  simplicity. 

The  frames  are  also  shown  partitioned  into  K*  ’’information 
cells"  that  form  the  basic  decision  elements  for  the  detec¬ 
tion  algorithm.  The  objects  to  be  located  belong  to  one  of 
a  set  of  known  classes  and  all  elements  in  a  given  class  are 
essentiailly  identical.  The  class  of  objects  of  current  in¬ 
terest  is  called  the  target  and  a  target  can  appear  at  any 
location  and  orientation  within  a  frame. 

With  this  problem  formulation  the  only  information  avail¬ 
able  for  target  detection  is  the  image  plane  irradiance  dis¬ 
tribution  function  I(x,y)  that  is  the  image  of  the  clut¬ 
tered  ground  scene.  Formation  of  this  image  I(x,y)  from 
an  object  scene  F(|,>7)  actually  represents  a  flow  of  in- 


Fig.  1.1  Generation  of  a  Vertical  Aerial  Scene 
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formation  from  the  object  plane  to  the  image  plane.  The 
carriers  of  information  are  the  photons.  In  traveling  from 
the  object  plane  to  the  imag-  plane,  a  photon  encounters 
intervening  physical  processes  such  as  lenses  and  the  atmos¬ 
phere.  The  sum  of  these  processes  forms  an  information  chan¬ 
nel.  In  its  most  basic  sense,  the  irradiance  distribution 
function  is  no  more  than  a  superposition  of  photon  events. 
These  events  are  photon  arrivals  (units  of  irradiance)  for 
the  image  or  photon  departures  (units  of  radiance)  from  the 
object.  The  sum  or  density  of  the  photon  events  as  a  func¬ 
tion  of  position  in  the  image  plane  defines  the  irradiance 
distribution  function  (Frieden,  1979) •  It  is  this  irradiance 
distribution  function  I(x,y)  or  photon  density  that  repre¬ 
sents  the  information  channel  output  and  that  must  be  char¬ 
acterized  and  used  in  the  detection  algorithm. 

Looking  at  the  image  more  mathematicadly,  let  C(x,y,t,A.) 
represent  the  spatial  energy  distribution  of  an  image  source 
of  radiant  energy  at  spatial  coordinates  (x,y)  at  time  t 
and  a  wavelength  X  .  Because  light  intensity  is  a  real 
positive  quantity,  that  is  because  intensity  is  proportional 
to  the  modulus  squared  of  the  electric  field,  the  image  light 
f\mction  is  real  and  non -negative.  Furthermore,  in  all  prac¬ 
tical  imaging  systems  there  is  always  a  small  amount  of  back¬ 
ground  light  present.  Because  of  this  background  light  and 
the  physical  restrictions  imposed  by  the  imaging  system,  it 
is  assumed  that 
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0  <  C(x,y, t ,X)  «  A 


where  A  is  the  maximum  brightness.  An  image  is  also  nec¬ 
essarily  limited  in  extent  by  the  imaging  system  and  the  re¬ 
cording  media.  For  mathematical  simplicity  all  images  in 
this  dissertation  are  assumed  to  be  nonzero  only  over  a 
square  region  for  which 


-L  at  x,y  «  L 


Since  the  image  is  also  observable  only  for  a  finite  time, 
(-Tst^T)  the  image  light  function  C(x,y,t,X)  is  a  bounded 
four-dimensional  function  with  bounded  independent  variables. 
As  a  final  restriction,  it  is  assumed  that  the  image  light 
function  is  continuous  over  its  domain  of  definition  (Pratt, 
19781  4).  The  image  light  function  C(x,y,t,X)  is  actually 
at  worst  piece-wise  continuous  and  is  well  approximated  by 
a  continuous  function. 

The  bri^tness  response  to  the  image  light  function 
C(x,y,t,X)  can  now  be  defined  for  both  men  and  machines. 

In  men  the  brightness  response  of  a  standard  human  observer 
is  commonly  used  to  define  the  instantaneous  luminance  of  the 
light  field  as  shown  by 


Y(x,y,t)  =  yT(x,y,t,X)V,  (X)dX 
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where  \\  (X)  represents  the  relative  luminous  efficiency 
function  or  the  spectral  response  of  human  vision.  Similarly, 
the  color  response  of  a  standard  human  observer  is  measured 
and  used  in  terms  of  some  set  of  tristimulus  values  that  are 
linearly  proportional  to  the  amounts  of  red,  green,  and  blue 
light  needed  to  "match"  a  colored  light.  In  a  machine  with 
a  multispectral  imaging  system  the  observed  image  field  is 
modeled  as  a  spectrally  weighted  integral  of  the  image  light 
function.  The  ith  spectral  image  field  is  then  given  by 


F,  (x,y,t) 


C(x,y,t,X)Sj  (X)dX 


where  S,  (X)  is  the  spectral  response  of  the  ith  sensor  (Hall, 
C.,  19781  17)*  For  a  monochrome  imaging  system,  as  will  be 
used  in  this  dissertation,  the  image  function  F(x,y,t)  nom¬ 
inally  denotes  the  image  luminance  or  some  converted  or  cor¬ 
rupted  physical  representation  of  luminance. 

The  image  function  F(x,y,t)  is  propagated  through  the 
information  channel  or  transformed  from  the  object  scene 
plane  to  the  image  plane  of  the  aircraft  to  form  the  instan¬ 
taneous  irradiance  distribution.  The  channel  transformation 
can  be  viewed  as  a  one-to-one  mapping  and  is  defined  by 


I(x,y,t) 


Vfhen  the  transformation  is  also  assumed  to  be  an  additive 


linear  operator  the  standard  superposition  integral  descrip¬ 
tion  of  the  channel  output  is  obtained.  Using  the  sifting 
property  of  delta  functions  the  mapping  is  first  rewritten  as 


I(x,y,t)  =  T 


lit 


F(^,U,t)  i (x-^ ,y-»7)d^dT7 


} 


Now  changing  the  order  of  the  general  linear  operator  T 
and  the  integral  operator  results  in  the  expression 

l(x,y,t)  =  T|i(x-|,y-»7)|d4di; 


Then  defining  the  channel  point  spread  function  as  H(x,y;^,i7) 
=  T  •|4(x-|,y-u)|  gives  the  desired  integral  expression  for 
the  channel  output  I(x»y,t).  The  supexrposition  integral 
description  of  the  channel  output  or  irradiance  distribution 
is  given  by 


I(x,y,t) 


•OB  .OB 


H(x,y;^,»7)d|di7 


In  the  object  detection  problem  of  interest  in  this  work,  the 
image  does  not  change  with  time  and  the  time  variable  can  be 
dropped  from  the  instantaneous  irradiance  distribution  to 
let  I(x,y)  represent  the  spatial  distribution  of  light  in 


the  image  plane  or  the  light  density  function. 

Now  the  normalized  irradiance  distribution  can  be  defined 


The  normalized  distribution  function  has  all  the  properties 
of  a  bivariate  probability  density  function  since 


x,y)dxdy 


1 


and  the  probability  of  a  photon  arriving  in  any  region  R  of 
the  image  is  given  by  the  expression 


P(R)  =  //^  (x,y)dxdy 


Several  other  authors  have  used  this  probability  density  vieW' 
point  in  their  work  in  image  processing.  Among  these  are 
Frieden,  working  with  image  restoration  techniques  (Frieden, 
1972)  and  Minerbo,  in  reconstructing  a  source  from  a  discrete 
set  of  projection  data  (Minerbo,  1979).  Using  this  viewpoint 
it  is  this  bivajriate  probability  density  f»anction  that  must 
be  characterized  and  used  in  the  detection  algorithm. 


Chapter  II.  Maximum  Entropy  and 
Minimum  Cross -Entropy 

The  principles  of  maximum  entropy  and  minimum  cross¬ 
entropy  proyide  a  means  of  approximating  the  normalized  irra- 
diance  distribution  i{x,y)  and  detecting  targets  of  inter¬ 
est  in  a  cluttered  aerial  scene.  The  approach  taken  in  deyel- 
oping  a  detection  rule  for  the  cluttered  scene  problem  of 
this  dissertation  is  based  on  entropy  and  cross-entropy  hav¬ 
ing  unique  properties  as  information  measures  (Johnson,  1979) 
and  on  cross-entropy  minimization  having  unique  properties 
as  an  inference  procedure  (Shore,  1980).  The  work  is  an 
extension  of  Miller's  work  (Miller,  1980)  in  approximating 
one -dimensional  probability  density  functions  using  a  maximum 
entropy  criterion  and  much  of  the  background  material  is  re¬ 
viewed  in  his  dissertation. 

Background 

The  principles  of  entropy  maximization  auid  cross -entropy 
minimization  both  have  their  roots  in  Shannon's  work  in  com¬ 
munication  theory.  For  discrete,  noiseless  systems,  maximiz¬ 
ing  the  source  entropy  results  in  the  best  source  encoding 
in  the  sense  of  enabling  the  highest  information  rate  over  a 
fixed  capacity  channel  (Shannon,  19^8a) .  For  continuous  sys¬ 
tems,  Shannon's  definition  of  source  rate  for  a  fixed  fidel¬ 
ity  criterion  or  rate-distortion  function  involved  the  mini- 


mization  of  a  functional  (mutual  information)  like  cross- 
entropy  (Shannon,  19^8b) .  However,  it  was  Edwin  Jaynes  who 
first  proposed  the  principle  of  maximum  entropy  as  a  means 
of  approximating  an  unknown  probability  density  function  more 
than  tv/enty-five  years  ago  (Jaynes,  1957)  •  While  the  name 
cross-entropy  is  due  to  Good  (Good,  1963),  the  principle  of 
minimum  cross -entropy  is  a  generalization  of  the  maximum  en¬ 
tropy  principle  that  was  first  proposed  by  Kullback,  who 
called  it  a  principle  of  minimum  directed  divergence  or  min¬ 
imum  discrimination  information  (Kullback,  195907)-  Jaynes' 
work  has  been  applied  in  a  number  of  areas,  but  within  the 
engineering  community  the  most  widely  known  application  is 
Burg’s  Maximum  Entropy  Spectral  Analysis  (MESA)  technique 
(Burg,  1967) •  However,  the  maximum  entropy  principle  is  ap¬ 
plied  indirectly  in  terms  of  filtering,  rather  than  directly 
in  terms  of  approximating  the  underlying  probability  densities 
and  it  is  not  widely  understood  that  MESA  is  identical  to 
Jaynes'  principle  (Shore,  1981).  Despite  their  many  proven 
applications,  Jaynes'  principle  of  maximum  entropy  and  Kull¬ 
back  's  principle  of  minimum  cross-entropy  have  had  a  contro¬ 
versial  history  due  to  their  rather  intuitive  justification 
based  on  entropy's  properties  as  an  information  measure. 
Recently,  however.  Shore  and  Johnson  have  demonstrated  (Shore, 
1980)  that  these  principles  are  correct  general  methods  of 
inference  when  given  information  in  terms  of  expected  values. 
Their  results  rest  on  four  consistency  axioms  which  are  used 


to  dem'nstrate  maximizing  any  other  function,  but  entropy 
will  lead  to  logical  inconsistencies  unless  that  function  and 
entropy  have  identical  maxima. 

Definitions  and  General  Problem  Statement 

Given  the  historical  outline,  this  section  will  describe 
the  general  setting  where  the  maximum  entropy  and  minimum 
cross-entropy  principles  can  be  applied  and  define  the  nota¬ 
tion  that  will  be  used  throughout  the  dissertation.  The  main 
interest  in  this  work  is  approximating  continuous  bivarate 
density  functions  and  making  logical  inferences  based  on  this 
approximation.  Because  the  cluttered  aerial  scene  problem 
is  driving  this  review,  all  n-dimensional  results  will  only 
be  presented  for  bivariate  density  functions. 

The  theory  for  approximating  discrete  probability  den¬ 
sity  functions  using  the  principle  of  maximum  entropy  is  well 
known  and  has  found  a  great  many  applications.  In  this  prob¬ 
lem  formulation,  the  underlying  system  has  n  possible 
states  X,  and  they  occur  with  unknown  probabilities  q(X|). 
The  system  is  observed  with  the  observations  taking  the  form 
Sq(x,  )  =  ”^1,  or  the  expected  value  of  a  set  of 

"information  functions”  {^k}*  problem  then  is  to  choose 

a  distribution  e(Xj)  that  Is  in  some  sense  the  best  esti¬ 
mate  of  q(x, )  given  the  expected  value  measurements.  In 
general,  there  remains  an  infinite  set  of  distributions  that 
au:e  not  ruled  out  by  the  expected  value  measurements  that 
now  serve  as  constraints  on  any  approximating  distribution. 


The  entropy  principle,  however,  provides  a  unique 

approximation  density  e(Xj )  by  selecting  from  the  infinite 
set  of  densities  that  satisfy  the  constraints  the  one  density 
with  the  largest  entropy  defined  as  -  S  )log|^e(X|  )j  . 

The  principle  of  minimum  cross-entropy  is  a  generaliza¬ 
tion  of  the  maximum  entropy  principle  that  applies  in  cases 
when  a  prior  distribution  p(Xj )  that  estimates  q(Xj )  is 
known  in  addition  to  the  measurement  constraints.  The  prin¬ 
ciple  states  that:  of  the  infinite  set  of  distributions 
e(Xj )  that  satisfy  the  constraints,  choose  the  one  with  the 
least  cross-entropy  Se(x,  )log  je(X|  )/p(X|  )J  .  The  connection 
between  the  two  principles  occurs  when  the  prior  is  a  uniform 
density  and  in  this  case  minimizing  cross-entropy  is  equiva¬ 
lent  to  maximizing  entropy.  The  concept  of  cross-entropy 
also  generalizes  correctly  for  continuous  probability  densi¬ 
ties  unlike  the  concept  of  maximum  entropy,  where  only  a 
differential  entropy  is  defined  in  the  continuous  case  and 
that  is  not  even  invariant  under  coordinate  transformations 
(McEliece,  1977:38). 

In  the  case  of  continuous  bivariate  probability  densities 
the  principle  of  minimum  cross-entropy  provides  a  general 
method  of  inference  about  an  unknown  density  q(x,y)  when 
there  exists  a  prior  estimate  of  q(x,y)  and  new  information 
about  the  unknown  density  in  the  form  of  expected  values  of 
the  information  functions.  The  principle  states  that:  of  all 
the  densities  that  satisfy  the  expected  value  constraints. 
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choose  as  the  approximating  density  the  posterior  e(x,y) 
with  the  least  cross-entropy 


H(e,p)  =  yy*e(x,y)log^£(^i^ 


dxdy 


where  p(x,y)  is  a  prior  estimate  of  q(x,y).  Jaynes  has 
also  shown  (Jaynes,  1968)  that  generalizing  entropy  maximiza¬ 
tion  to  continuous  densities  leads  to  the  above  cross-entropy 
functional  with  p(x,y)  being  called  an  "invariant  measure" 
instead  of  a  prior  density.  When  using  the  entropy  maximiza¬ 
tion  principle,  there  is  an  implicit  assumption  of  uniform 
priors  when  viewed  from  the  broader  cross-entropy  perspective. 
The  failure  of  maxim\im  entropy  to  generalize  as  might  be  ex¬ 
pected  is  also  explained  by  this  viewpoint  since  a  uniform 
prior  in  one  coordinate  system  may  not  be  uniform  in  another 
coordinate  system  (Shore,  1980). 

The  Consistency  Axioms 

Shore  and  Johnson  (Shore,  1980)  have  proven  that  given 
a  prior  density  and  new  information  in  the  form  of  constraints 
on  expected  values,  there  is  only  one  posterior  density  satis¬ 
fying  these  constraints  that  can  also  be  chosen  in  a  manner 
that  satisfies  a  set  of  logical  consistency  axioms.  In  addi¬ 
tion,  this  unique  posterior  density  can  be  obtained  by  mini¬ 
mizing  the  cross-entropy  functional.  The  four  consistency 
axioms  are  informally  defined  as  follows i 
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1.  Uniqueness:  The  result  should  be  unique. 

2.  Invariance:  New  information  can  be  accounted  for 
in  any  coordinate  system. 

3.  System  Independence:  Independent  information  about 
independent  systems  can  be  accounted  for  separately 
in  terms  of  different  densities  or  together  in  terms 
of  a  joint  density. 

4.  Subset  Independence:  Information  about  an  independ¬ 
ent  subset  of  system  states  can  be  accounted  for  in 
terms  of  a  separate  conditional  density  or  it  terms 
of  the  full  system  density. 

All  four  of  these  axioms  are  based  on  a  single  fundamen¬ 
tal  principle:  If  a  problem  can  be  solved  in  more  than  one 
way,  the  results  should  be  consistent  (Shore,  1980).  The 
axioms  are  the  desired  properties  of  an  inference  procedure 
rather  than  the  desired  properties  of  an  information  measure. 
Using  only  a  general  functional  J(e,p)  to  select  the  poste¬ 
rior  density  e(x,y)  in  the  inference  procedure  and  starting 
with  the  axioms  of  subset  independence  and  invariance.  Shore 
and  Johnson  were  able  to  show  that  the  first  consequence  of 
their  axioms  was  to  restrict  J(e,p)  to  functionals  that  are 
equivalent  to  the  form 


J(e,p)  =  /"/"f  [e(x,y)  ,p(x,y)  dxdy 


« 


I 

I 
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for  sc  e  function  f  of  two  variables.  This  functional  form 
is  called  the  "sum  form"  and  in  work  previous  to  Shore  and 
Johnson's  development,  the  sum  form  was  assumed  rather  than 
derived  (Johnson,  1979) .  Then  having  established  this  func¬ 
tional  form  and  using  the  general  axiom  of  invariance,  they 
show  that  J  is  further  restricted  to  functionals  that  are 
equivalent  to  the  form 


J(e, 


dxdy 


where  h  is  some  function  of  a  single  variable.  Using  all 
four  axions.  Shore  aind  Johnson  are  finally  able  to  show  that 
J  must  be  equivalent  to  the  functional 


J(e,p) 


dxdy 


or  J(e,p)  must  be  equivalent  to  cross -entropy.  Since  it 
is  possible  that  no  functional  satisfies  the  consistency 
aucioms,  their  final  step  is  to  show  that  the  cross-entropy 
functional  H(e,p)  satisfies  all  four  axioms.  The  Shore 
and  Johnson  result  has  immediate  application  to  approximating 
the  normalized  irradiance  distribution  i(x,y)  since  it  pro¬ 
vides  a  logically  consistent  method  of  approximating  the 
light  density  based  on  measurements  in  the  form  of  expected 
values.  The  procedure  to  follow  then  requires  a  prior  esti¬ 
mate  of  the  light  density,  expected  value  information  about 


15 


the  tr:e  density  i(x,y)  and  the  functional  H(e,p)  to 
measure  how  much  the  prior  density  differs  from  the  posterior 
density.  The  principle  of  minimum  cross-entropy  is  then  the 
correct  method  of  incorporating  all  the  given  information 
and  producing  a  logically  consistent  posterior  density  e(x,y) 
that  approximates  the  unknown  true  light  density  i(x,y). 

Properties  of  Cross-Entropy  Minimization 

The  basic  properties  of  cross-entropy  minimization  are 
fundamental  to  the  problem  of  detecting  objects  in  a  cluttered 
aerial  scene  using  the  posterior  density  e(x,y)  as  an  opti¬ 
mum  light  density  approximation.  Because  of  their  importance 
in  developing  a  target  detection  algorithm  and  for  complete¬ 
ness*  I  will  outline  the  well-known  properties  of  cross¬ 
entropy  minimization  and  the  notational  system  developed  by 
Johnson  and  Shore  (Johnson,  1980).  Many  results  dealing  with 
cross-entropy  minimization  can  be  efficiently  stated  in  terms 
of  an  abstract  information  operator  *  which  takes  the  two 
known  arguments  of  a  prior  density  and  new  expected  value 
information  to  yield  a  posterior  density.  Using  this  opera¬ 
tor  notation,  the  posterior  e  is  given  by  e  =  p*I  where 
I  stands  for  the  known  constraints  on  the  expected  values. 

The  problem  will  be  stated  more  formally  in  this  section 
to  allow  concise  definitions  of  minimum  cross-entropy  proper¬ 
ties.  Again  in  this  outline,  because  of  the  thrust  in  this 
dissertation  of  approximating  a  bivariate  density  function, 
all  results  will  be  presented  only  for  the  two-dimensional 
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case.  The  formal  problem  statement  defines  a  point  (x,y) 
in  the  x-y  plane  as  a  system  state  with  D  the  region  in  the 
plane  where  all  states  are  defined.  Then  S  is  the  set  of 
all  probability  densities  S(x,y)  on  D  such  that 
S(x,y)  a  0  for  (x,y)  €  D  and 


S(x,y)dxdy  =  1 


New  information  takes  the  form  of  linear  equality  constraints 
or 


(x,y)dxdy  =  m^^ 


where  q(x,y)  is  the  unknown  true  system  density  with 
q(x,y)  €  S  and  f,j(x,y)  are  known  information  functions 
with  known  expected  values.  The  probability  densities  that 
satisfy  these  constraints  always  comprise  a  convex  subset 
Z  of  S  (Johnson,  1979) •  The  set  Z  is  then  termed  a 
constraint  set  and  in  general,  a  given  convex  region  Z  of 
S  may  be  defined  by  more  than  one  set  of  information  func¬ 
tions.  The  fact  that  the  constraints  fonn  a  convex  subset 
of  S  insures  the  convergence  of  computational  methods  at¬ 
tempting  to  find  the  minimum  cross-entropy  posterior  density. 
The  expected  value  constraints  and  the  resulting  convex  set 
Z  form  the  term  I  used  in  the  abstract  operator  notation. 

The  second  argument  for  the  information  operator  *  is 
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a  prio’  density  p(x,y) .  The  density  p(x,y)  contained  in 
S  is  used  to  define  an  estimate  of  q(x,y)  which  can  be 
obtained  by  any  means  prior  to  learning  the  average  value 
information  I .  The  prior  density  is  required  to  be  strictly 
positive  over  D: 

p[(x.y)  «  d]  >0 

In  making  this  restriction,  it  is  assumed  that  D  is  the  set 
of  states  that  is  possible  according  to  the  prior  informa¬ 
tion.  The  restriction  does  not  significantly  restrict  results, 
but  does  avoid  the  technical  problems  that  would  result  from 
division  by  p(x,y)  equal  to  zero.  In  a  more  general  setting, 
D  would  be  a  measurable  space  and  p  and  e  would  be  re¬ 
placed  by  prior  and  posterior  probability  measures.  By  defin¬ 
ing  probability  densities,  it  is  implicitly  assumed  there  is 
some  underlying  measure  with  respect  to  which  the  other  meas¬ 
ures  are  absolutely  continuous  (Kullback,  1959:^).  Such  a 
measure  will  exist  when  no  event  with  zero  prior  probability 
can  have  a  positive  posterior  probability  and  which  is  demand¬ 
ed  by  the  strictly  positive  assumption  for  p(x,y)  (Guiasu, 
1977) . 

Given  the  two  arguments  for  the  information  operator  ♦ 
(the  prior  p(x,y)  and  new  information  I),  the  posterior 
density  e(x,y)  €  Z  that  results  from  taking  I  into  account 
is  selected  by  minimizing  the  cross-entropy  H(s,p)  in  the 


constr-int  set  Z; 


H(e,p)  -  ®^2h(s,p) 

Using  this  problem  statement  and  the  cross-entropy  minimiza¬ 
tion  procedure,  the  following  properties  apply  to  cross¬ 
entropy  minimizations 

Property  Is  (Uniqueness)  The  posterior  e  =  p*I  is 
unique. 

The  uniqueness  property  insures  that  the  solution  of  a 
given  cross -entropy  minimization  problem  for  the  posterior 
density  e(x,y)  is  unique.  The  minimization  of  the  func¬ 
tional  H(e,p)  allows  this  unique  density  to  be  identified. 

Property  2s  (Prior  Omnipotence)  The  posterior  satisfies 
e  =  p*I  =  p  if  and  only  if  the  prior  satisfies  p  €  Z. 

The  prior  omnipotence  property  shows  that  when  the  new 
information  I  agrees  with  the  assumed  prior  density  the 
prior  and  posterior  are  equal.  When  cross-entropy  minimiza¬ 
tion  is  viewed  as  an  inference  procedure,  it  makes  sense  that 
the  posterior  density  e(x,y)  should  be  unchanged  from  the 
prior,  if  the  new  information  does  not  contradict  the  prior 
density  p(x,y)  in  any  way. 

Property  3»  (Idempotence)  (p*I)*I  =  p*I 


I'-empotence  insures  that  taking  the  same  information 
into  account  twice  has  the  same  effect  as  taking  it  into  ac¬ 
count  once. 

Property  4:  (Information  Intersection)  Let  I  he  the 
information  1^  =  (qeZj)  where  this  notation  denotes  that 
q  is  a  member  of  the  constraint  set  S  S  created  by  the 
constraints  1^  and  I^  the  information  I^  =  (q  €  Z^),  for 
overlapping  constraint  sets  Z^,  Z^  £  S.  If  (p*Ij)  € 
holds ,  then 

p*Ij  =  (p*lj)*(l^nl^)  =  (p*Ij)*Ij  =  p*(ljnlj) 


holds . 

The  information  intersection  property  is  similar  to  the 
prior  omnipotence  property.  The  result  shows  that  when  Ij 
is  taken  into  account,  if  the  resulting  posterior  density 
p*Ij  already  satisfies  the  constraints  imposed  by  the  addi¬ 
tional  information  I^ ,  then  taking  Ij  into  account  in 
various  ways  has  no  effect  on  the  posterior  density. 

Property  5*  (Invariance)  Let  T  be  a  coordinate  trans¬ 
formation  from  (x,y)  «  D  to  (u,v) €  R  with  (Te)(u,v)  = 

J  e(x,y),  where  J  is  the  Jacobian  J  =  d(u,v)/d(x,y) . 

Let  TS  be  the  set  of  densities  Te  corresponding  to  densi¬ 
ties  e  €  S.  Let  (TZ)  s  (TS)  correspond  to  Z  c  S.  Then 


(Tp)*(TI)  =  T(p*I) 


and 


h[t(p*I),Tp]  =  H(p*I.p) 


hold,  where 


TI  =  f(Tq)  €  (TZ)] 

or  Tq  is  a  member  of  the  constraint  set  TZ  c  TS  created 
by  the  constraints  TI. 

The  invariance  property  states  that  the  same  answer  is 
obtained  when  an  inference  problem  is  solved  in  two  different 
coordinate  systems,  in  that  the  posterior  densities  in  the 
two  systems  are  related  by  the  coordinate  transformation. 
Also,  the  cross-entropy  between  the  posteriors  and  the  priors 
has  the  same  value  in  both  coordinate  systems. 

Property  6i  (System  Independence)  Let  there  be  two 
systems,  with  sets  and  of  states  and  probability 

densities  of  states  e^  €  Sj  and  e^  €  Sj .  Let  Pj  €  Sj 
and  Pj  €  Sj  be  prior  densities.  With  Ij  =  (q^  «  Z^ )  and 
Ij  =  (qj  €  Zj )  new  information  about  the  two  systems,  where 
Z^  S  Sj  and  Zj  fi  S,  .  Then 

<P,P,)*(I,  0  I,)  =  (p,%)(p,*I,) 
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and 


HCe^ej.PjP^)  =  H(ej,pj)  +  HCe^.p,) 

hold  where  e^  =  Pj*Ii  and  e^  =  P2*l2  . 

The  system  independence  property  shows  that  it  does  not 
matter  whether  independent  information  about  two  systems  is 
accounted  for  separately  or  together  in  terms  of  a  joint 
density.  Whether  or  not  the  two  systems  are  in  fact  independ¬ 
ent  is  irrelevant  since  the  property  applies  as  long  as  there 
are  independent  priors  and  independent  new  information. 

Property  7*  (Triangle  Relations)  For  any  r(x,y)  €  Z 

H(r,p)  a  H(r,e)  +  H(e,p) 

where  e  =  p*I .  When  I  is  determined  by  a  finite  set  of 
equality  constraints  only,  equality  holds. 

The  triangle  equaility  is  important  for  all  applications 
in  which  cross-entropy  minimization  is  used  for  purposes  of 
classification  on  pattern  recognition. 

Property  8j  (Posterior  Convergence)  The  relationship 

H(q,p*l)  «  H(q,p) 

holds  with  equality,  if  and  only  if  p*I  =  p. 
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T'e  posterior  convergence  property  states  that  the  pos¬ 
terior  density  e(x,y)  is  always  closer  to  the  scene  density 
q(x,y)  in  the  cross-entropy  sense  than  is  the  prior  density 
p(x,y) . 

Property  9«  (Piecemeal  Information)  Let  the  system 
have  a  probability  density  q  «  S ,  and  let  there  be  informa¬ 
tion  Ij  =  (q  €  Zj)  and  =  (q  e  Z^),  where  Z^  ,  Z^SS 
are  constraint  sets  with  non-empty  intersection.  Given  that 
Zj  is  determined  by  a  set  of  equality  constraints  only,  then 

(p*Ij)*(Ij  n  ij)  =  p*(lj  n  Ij) 


and 


H(e,p)  =  H(e,ej)  +  H(ej,p) 

hold  where  e  =  p*(Ij  n  i^)  and  e^  =  p*Ij  . 

The  piecemeal  information  property  is  also  important 
because  of  its  application  in  classification  and  pattern  re¬ 
cognition.  In  general,  this  result  is  important  in  any  ap¬ 
plication  where  the  constraint  information  arrives  piecemeail 
and  states  that  intermediate  posterior  densities  can  be  used 
as  priors  in  computing  final  posterior  densities  without  af¬ 
fecting  the  results. 

There  are  additional  cross-entropy  minimization  proper- 


ties  or  general  interest  not  covered  in  this  listing.  The 
additional  properties  will  be  developed  and  discussed  in  the 
next  chapter.  Chapter  III  will  develop  a  minimum  cross - 
entropy  posterior  density  approximation  and  a  target  detec¬ 
tion  algorithm  based  on  the  approximation  and  properties  of 
minimum  cross -entropy  densities. 


Chapter  III .  Detection  Algorithm  Development 


Introduction 

The  theoretical  development  of  an  algorithm  for  detect¬ 
ing  complex  man-made  objects  in  cluttered  aerial  scenes  will 
be  presented  in  this  chapter.  The  introduction  to  this  dis¬ 
sertation  outlined  the  general  framework  for  the  object  de¬ 
tection  problem  and  showed  how  scene  frames  are  partitioned 
in  K*  information  cells.  The  detection  algorithm  developed 
in  this  chapter  is  then  sequentially  applied  to  each  informa¬ 
tion  cell  in  a  frame  resulting  in  all  cells  being  classified 
as  containing  targets  or  only  clutter.  To  develop  the  de¬ 
tection  algorithm,  the  irradiance  distribution  function  for 
the  ith  cell,  in  the  jth  frame  will  be  denoted  Q,|  (x,y). 

The  normalized  irradiance  distribution  function  is  denoted 
Qij  (Xfy)  and  has  all  the  properties  of  a  bivariate  probabil¬ 
ity  density  function.  Following  the  notation  of  previous 
chapters,  e,j  (x,y)  is  the  minimum  cross -entropy  approxima¬ 
tion  to  the  ith  cell  and  jth  frame  normalized  irradiance 
distribution  function  q,|  (x,y).  The  computation  of  the 
approximation  e,j  (x,y)  requires  a  prior  ith  cell  and  jth 
frame  density  p,j  (x,y)  and  new  expected  value  information 
I, I  .  Throughout  the  remaining  sections  of  this  dissertation 
it  is  assumed  we  are  working  with  the  ith  information  cell, 
in  the  jth  frame  of  an  aerial  scene  and  the  explicit  reference 


to  the  cell  and  frame  number  will  be  dropped  unless  it  is 
required  for  clarity. 

Solution  of  the  Constraint  Equations 

To  classify  cell  density  functions  q,j  (x,y)  as  contain¬ 
ing  targets  or  only  clutter  will  require  an  explicit  proce¬ 
dure  for  obtaining  the  minimum  cross -entropy  density  e,j  (x,y) 
based  on  a  set  of  information  function  expected  value  rela¬ 
tions.  The  information  functions  f|j(x,y)  used  in  the  mini- 
m\im  cross -entropy  inference  procedure  are  critical  components 
of  the  detection  algorithm  and  will  be  explored  fully  in  the 
next  chapter.  The  expression  for  the  minimum  cross -entropy 
posterior  density  can  be  found  given  that  the  number  and 
forms  of  the  information  functions  are  specified  and  their 
expected  values  have  been  computed  over  the  information  cell 
or  symbolically,  given  f,j(x,y)  and  m^;  k  =  0,l,2...t  are 
known.  The  minimum  cross-entropy  posterior  approximation  of 
<l(x,y)  will  then  be  the  continuous  density  e(x,y)  defined 
on  the  region  -C  s  x,y  s  C  that  has  a  prior  representation 
p(x,y)  and  will  satisfy  the  new  expected  value  information 
I.  The  mathematical  statement  of  the  problem  is  to  find 
e(x,y)  subject  to  the  constraints* 

InMx^l 

LpU,y)J 

subject  to 


26 


ffeU. 


y)dxdy  =  1 


// 


ffc(x,y)  e(x,y)dxdy  =  m. 


k  =  1,2. . .t 


The  information  functions  f,,(x,y),  k  =  l,2...t  are  contin¬ 
uous  and  bounded  on  the  region  -C  s  x,y  s  C.  The  problem 
stated  above  is  a  constrained  minimum  problem  and  can  be 
solved  using  the  Lagrange  method  of  undetermined  coeffieients . 
The  Lagrangian,  Lj^e(x,y),Aj  is  then  formed  as  follows 
(Luenberger,  19698213): 


I'^e(x,y),Aj  =  -H(e,p)  “  ^y®(x,y)dxdy  -  1 

.  r  ..  c-  -i  - 

"X/M  J y^’j  (x,y)e(x,y)dxdy  -  nij  > 


Using  the  expression  for  cross-entropy,  the  Lagrangian  can 
be  expressed  as: 


L[^e(x,y),A]  =  yye(x,y)Jlnr||^»|j  ^ A.jf j  (x,y) ^dxdy 

■c  I  J 

t 


The  Lagrangian  can  also  be  written  in  the  form: 
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L[e(x,,),Aj  =  yy*e(x.y)^ln  ^A.f.  (x,y)j  >dxdy 

^  > 

t 

+  Xo  EVi 

Now  using  the  fact  that  for  Z  >  0  the  natural  logarithm  is 
bounded  by 


ln(Z)  <Z-1  if  Z/1 


ln(Z)  =Z-1  if  Z=1 


provides  a  method  of  bounding  the  Lagrangian.  Using  this 
property  of  natural  logarithms  provides  the  relationship: 

L[e(x.y).A]  S  exp  -A,- (x.y)  j  -  Idxdy 

*  \  *  E\"'i 
1-1 

The  goal  of  this  procedure  is  to  maximize  the  Lagrangian 
I*  X I y ) ,  aJ  and  therefore  e(x,y)  must  be  selected  to  pro¬ 
vide  equality  in  the  last  expression.  Again,  using  the  pro¬ 
perty  of  natural  logarithms  equality  occurs,  if  and  only  if 


e(x,y)  =  p(x,y)exp 


K-e 

L  i-» 


X  j  f  I  ( X I 
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the  point  where  Z  =  1.  The  preceding  result  is  v;ell-knovm 
(Johnson,  1930:27),  however,  the  derivation  given  here  is 
unique  to  this  dissertation.  The  derivation  of  the  minimum 
cross-entropy  density  seen  here  is  an  extension  of  Miller’s 
work  in  maximum  entropy  and  univariate  densities  (Miller, 
1980:29) . 

The  expression  above  for  e(x,y)  provides  the  required 
form  of  the  minimum  cross-entropy  density  that  approximates 
the  unknown  true  density  q(x,y).  Given  a  specific  set  of 
expected  values  (m  I  k  =  0,l...t),  we  solve  the  t  +  1  con¬ 
straint  equations  for  a  =  to  then  completely 

determine  e(x,y).  The  method  of  solving  the  given  set  of 
nonlinear  constraint  equations  for  the  lambda  vector  will  be 
presented  in  Chapter  V. 

From  property  one  of  Chapter  II,  we  know  the  minimum 
cross-entropy  posterior  density  e(x,y)  is  unique.  In  terms 
of  the  abstract  information  operator  *  a  solution  to  the 
cross-entropy  minimization  problem,  if  one  exists,  is  unique 
provided  only  that  H(e,p)  is  not  identically  infinite  as 
e(x,y)  ranges  over  the  constraint  set  Z.  A  condition  that 
guarantees  the  existence  of  a  solution  is  that  in  addition  to 
containing  a  density  e(x,y)  with  finite  cross -entropy ,  the 
constraint  set  Z  is  closed  (Johnson,  1980:5).  For  Z  to 
be  closed,  it  suffices  in  turn  that  the  constraint  functions 
f^(x,y)  are  bounded.  Conversely,  given  values  of 
A=  (  Ag,  Aj , . . .  Af)^  such  that  aG.1  constraints  are  satisfied, 


then  f-e  solution  exists  and  is  given  by  the  above  minimum 
cross-entropy  expression  for  e(x,y).  Conditions  for  the 
general  existence  of  solutions  to  the  constrained  minimiza¬ 
tion  problem  are  also  discussed  by  Csiszar  (Csiszar,  1975) • 
For  this  work  with  normalized  irradiance  distributions  using 
a  finite  set  of  bounded  information  functions  f,j(x,y)  and 
only  equality  constraints,  the  solution  to  the  constrained 
minimization  problem  will  always  exist  and  have  the  unique 
form  for  e(x,y)  derived  in  this  section  as  the  minimum 
cross-entropy  density. 

Solution  Characteristics 

In  general  cross-entropy  H(e,p)  measures  how  much 
e(x,y)  differs  from  the  prior  p(x,y) .  The  cross-entropy 
at  the  minimum  can  be  expressed  in  terms  of  the  Lagrange 
multipliers  and  the  expected  values  of  the  information  func¬ 
tions.  Starting  with  the  expression  for  the  minimum  cross - 
entropy  density  or 

e(x,y)  =  p(x,y)exp[^-A^-  ^A,f,(x,y)j 
and  rearranging  gives  the  expression 


Now  multiplying  by  e(x,y)  and  integrating  over  the  informa¬ 
tion  cell  gives  the  expression 


yye(x,y)ln|*.i|£i^j  dxdy  =  -  Je{x,y)dx<ly 
-  tMfi  j  ^x,y)e(x,y)dxdy 

Therefore,  the  cross-entropy  H(e,p)  at  the  minimum  point 


is  given  by 


t 

H(e,p)  =  -Xq  -^X,ni, 


Kullback  has  also  shovm  that  cross-entropy  in  general 
satisfies  the  relationship 


H(e,p)  a  0 

and  with  equality  only  if  p(x,y)  =  e(x,y)  almost  everywhere 
(Kullback,  1959).  Informally,  H(e,p)  is  a  measure  of  the 
information  divergence  between  the  density  function  e(x,y) 
and  a  prior  density  function  p(x,y).  Then  using  H(e,p) 
as  an  information  divergence  measure  and  since  e  =  p*I  mini 
mizes  H(e,p),  the  posterior  approximation  for  q(x,y)  is  as 
close  as  possible  in  an  information-measure  sense  to  the 
prior  density  while  at  the  same  time  satisfying  the  new  in¬ 
formation  constraints  I  taken  from  the  unknown  cell  density 
q(x,y) . 


Further  Minimum  Cross-E 


Properties 


The  properties  presented  in  this  section  highlight 
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cross- -ntropy ’ s  ability  to  measure  how  much  a  posterior  den¬ 
sity  differs  from  the  assumed  prior  density.  Even  though 
cross-entropy  does  not  have  all  the  properties  of  a  metric, 
H(e,p)  does  have  other  properties  that  make  it  ideal  for 
use  in  a  target  detection  algorithm.  These  properties  are 
presented  in  this  section  and  then  used  in  the  next  section 
to  develop  the  minimxam  cross-entropy  detection  rule. 

Triangle  Equality:  Let  I  be  the  constraints  given  by 


Mil 


y)q(x,y)dxdy  =  m. 


k  =  1,2. . .t 


and  let  p(x,y)  be  any  prior  probability  density.  Then 


H(q,p)  =  H(q.p*I)  +  H(p*I,p) 


The  minimum  cross-entropy  posterior  estimate  of  q(x,y) 
is  both  logically  consistent  (four  consistency  axioms)  and 
closer  to  q(x,y)  as  measured  by  cross-entropy  than  the 
prior  density  p(x,y).  Also,  the  difference  H(q,p)  -  H(q,e) 
is  exactly  the  cross-entropy  H(e,p)  between  the  posterior 
and  the  prior.  Therefore,  H(e,p)  can  be  interpreted  as  the 
amount  of  information  provided  by  the  constraints  I  that  is 
not  inherent  in  p(x,y).  The  posterior  accessibility  proper¬ 
ty  also  shows  that  the  difference  H(q,p)  -  H(q,e)  will 
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equal  :.ero  when  the  correct  expected  value  constraints  are 
provided . 

Posterior  Accessibility*  For  any  density  d(x,y)  there 
exists  constraints  such  that  d  =  p*I^  for  any  prior 

density  p(x,y). 

This  property  due  to  Csiszar  (Csiszar,  1975)  shows  that 
H(d,p)  is  in  general  the  amount  of  information  needed  to 
determine  d(x,y)  when  given  the  prior  p(x,y).  The  result 
also  shows  that  the  cross -entropy  H(d,p)  measures  the  error 
introduced  by  using  p(x,y)  instead  of  the  true  density 
d(x,y).  Used  as  an  error  measure,  the  posterior  accessibility 
property  will  allow  the  template  to  scene  cross-entropy 
H(q,t)  to  provide  a  "metric"  for  measuring  the  detection 
rule's  sensitivity  to  variations  in  the  performance  determin¬ 
ing  parameters  presented  in  Chapter  V.  The  next  property 
also  shows  that  the  minimum  cross-entropy  template  provides 
the  minimum  error  possible  when  the  template  is  restricted 
to  an  exponential  form. 

Expected  Value  Matching:  Let  I  be  the  constraints 


fh  (x,y)q(x,y)dxdy  = 

call 


m. 


1:=  1,2.  .  .t 


for  a  fixed  set  of  information  functions  f|,(x,y)  and  let 


be  the  result  of  taking  this  information  I  into 


e  =  ?•*■! 

account.  Then  for  an  arbitrary  fixed  density  d(x,y)  the 
cross- entropy  H(d,e)  =  H(d,p*I)  has  its  minimum  value  when 
the  constraints  satisfy 


tn  -  rM)  - 

m.  =  mr  = 


=  // 
cell 


d(x,y)f.  (x,y)dxdy 


k=  1,2. . .t 


This  result  is  due  to  Johnson  and  Shore  (Johnson,  1980) 
and  is  a  generalization  of  a  property  of  orthogonal  poly¬ 
nomials  that  in  the  case  of  speech  analysis  is  called  the 
"correlation  matching  property"  (Markel,  1976).  Using  this 
result  insures  that  when  a  minimum  cross-entropy  density 
e(x,y)  of  the  general  form 

e(x,y)  =  p(x,y)exp^-X^-  (x,y^ 

then  H(d,e)  is  smallest  when  the  expectations  of  e(x,y) 
match  those  of  the  arbitrary  density  d(x,y).  Therefore,  in 
general  it  follows  that  e  =  p*I  is  not  only  the  density  that 
minimizes  the  prior  to  posterior  cross-entropy  H(e,p),  but 
it  is  also  the  density  of  the  general  form  shown  above  that 
minimizes  the  posterior  to  scene  cross-entropy  H(q,e)  since 
d(x,y)  was  an  arbitrary  density  (Shore,  1980).  Hence, 
e(x,y)  is  not  only  closer  to  q(x,y)  than  is  p(x,y) ,  but 


it  is  ■'he  closest  possible  density  of  the  exponential  form 
given  above  for  e(x,y). 
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The  final  property  of  c^oss -entropy  minimization  re¬ 
quired  to  develop  a  target  in  clutter  detection  algorithm  is 
the  posterior  adaptation  property  presented  by  Johnson  and 
Shore  (Johnson,  1980). 

Posterior  Adaptation:  Let  and  stand  respectively 

for  the  information  constraints 


and 


//^ 

call 


(x,y)qi(x,y)dxdy  =  m'J 


//  ,  (x,y)q2 

call 


(x,y)dxdy  =  m® 


which  involve  the  same  set  of  information  functions  fj  (x,y) 
where  j  =  1 , 2 . . . t .  Then 


(p*]^)*Ij  =  p*Ij 


and 


HCe^.p)  =  H(ej,ej)  +  H(ej,p)  +  2®}’  - 


(t) 


hold  where  e^  =  p*Ij  ,  ej=  p*I^  and  A.  are  the  Lagrange 


multipliers  associated  with  e,  =  p*I.  . 


T’.e  application  of  this  property  viev/s  q^Cx.y)  and 

unknown  system  or  scene  probability  densities 
at  two  different  points  in  time.  Then  qj(x,y)  is  used  as 
a  prior  estimate  or  template  for  qjCx.y).  The  posterior 
adaptation  property  shows  that  when  is  determined  by 

expectations  of  the  same  information  functions  that  also 
produced  ,  the  results  of  producing  a  posterior  ej(x,y) 
using  Ij  are  completely  wiped  out  by  subsequently  producing 
a  posterior  e^Cx.y)  using  .  The  posterior  adaptation 
property  is  shown  graphically  in  Figure  3*1* 

The  Detection  Algorithm 

Using  a  constant  set  of  information  functions 
ff,(x,y)|  j  =  0,l,2...tj  (see  Chapter  IV)  and  a  uniform  prior 
density  p'(x,y) ,  the  posterior  adaptation  property  serves  as 
a  starting  point  for  the  detection  algorithm.  The  information 
is  obtained  from  a  set  of  predefined  template  scenes 
q,^*‘^x,y)  where  k  =  1,2...2Q.  These  template  scenes  model 
the  target  of  interest  and  various  possible  clutter  c''nfig- 
urations  to  provide  the  detection  rule  with  Q  target  versus 
clutter  alternatives.  With  this  information  a  set  of 

minimum  cross-entropy  (maximum  entropy)  template  densities 
(x,y)  where  k  -  1,2...2Q  can  be  defined  as  t^**^  = 

corresponding  to  the  ej(x,y)  density  in  the  posterior  adap¬ 
tation  result. 

In  a  more  general  setting,  when  there  are  N  targets  of 
interest  the  minimum  cross-entropy  template  densities  will 


r  A  ^ 


p(x,y^ 


(Ik) 

where  p(x,y)  is  the  uniform  prior  density  and  I  is  the 
expected  value  information  obtained  from  the  template  scenes 
as 


where 


(x,y)<f'‘*  (x,y)dxdy  =  m®"* 
call 


j  =  1,2. . . t 

k  =  1,2. ..Q 

and 


1  =  0,1... N 


The  production  of  the  Q(N  +  1)  template  scenes 
(x,y)  is  presented  in  Chapter  VI,  however,  the  basic  prin¬ 
ciple  uses  a  master  target  template  to  represent  each  of  the 
N  target  classes.  Each  master  target  template  is  then  super¬ 
imposed  on  the  Q  different  clutter  backgrounds  to  offer 
Q  target  and  clutter  configurations  to  the  detection  algo¬ 
rithm  for  each  target  class.  Appendix  A  shows  eighteen  three- 
dimensional  template  scenes  used  to  test  the  detection  rule, 
nine  of  which  represent  a  tank  in  clutter  and  nine  of  which 
represent  only  clutter.  This  set  of  template  scenes  where 
N  equals  1  and  Q  equals  9  provided  the  expected  value 
information  used  to  produce  the  template  densities  shown  in 
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Appendix  B.  In  the  general  setting  and  using  the  resulting 
template  densities  t*"**  (x,y)  every  information  cell  analyzed 
by  the  detection  rule  will  be  classified  as  only  clutter  or 
as  containing  one  of  the  N  targets  of  interest.  The  clas- 

(ll^) 

sification  is  based  on  the  ability  of  cross-entropy  H(q,t  ) 
to  measure  how  much  the  cell  density  q(x,y)  differs  from 
the  template  densities  t^"‘^(x,y). 

To  develop  the  actual  classification  rule  several  other 
minimum  cross-entropy  properties  must  also  be  used.  Using 
the  same  set  of  information  functions  |^fj(x,y)jj  =  0,l,2...tj 
used  to  construct  the  minimum  cross-entropy  template  densities 
t(»‘)(x,y),  measurements  of  the  information  functions  expected 
values  are  taken  from  the  scene  density  (x,y)  of  the  ith 
cell  and  the  pth  frame  of  the  aerial  scene.  These  measure¬ 
ments  form  a  set  of  constraints  I  on  the  posterior  density 
and  are  obtained  as 


where 


call 


(x,y)q(x,y)dxdy  =  mj 


j  —  lf2...t 

to  form  a  measurement  vector  M.  Using  this  constraint  infor¬ 
mation  coupled  with  the  prior  density  p(x,y)  will  allow  a 
minimum  cross-entropy  posterior  density  to  be  produced  as 
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e  =  p"*" .  Figure  3*2  shows  all  the  densities  used  in  this 
development  of  the  detection  algorithm  and  how  these  densi¬ 
ties  evolve  as  new  expected  value  information  is  used  in  the 
minimum  cross-entropy  procedure. 

The  cell  information  I  can  also  be  applied  to  the 
(N  +  1)Q  minimum  cross-entropy  template  densities  t*"‘*(x,y). 
Abstractly,  this  procedure  is  forming  a  new  set  of  (N  +  1)Q 
adapted  template  densities  using  the  new  expected  value  in¬ 
formation  I  and  the  predefined  template  densities  t*"‘*(x,y) 
as  priors.  Using  the  operator  notation  adapted  densities  are 
constructed  as 


jjN  _  ^(Ih)  *  j 


where 


1  =  0,1,2. . .N 

k  —  1,2. ..Q 

Figure  3*2  provides  a  complete  summary  of  the  detection 
algorithm  notation  and  minimum  cross-entropy  densities  being 
generated.  The  scene  density  q(x,y)  represents  a  general 
information  cell  that  must  be  classified  by  the  algorithm. 
Now  the  triangle  equality  can  be  applied  to  show  that 


Density  Function  Evolution 


where  1  =  0,1,2...N  and  k  =  1,2...Q.  Also  the  adapt el 
template  densities  were  obtained  as 

^010  = 


or 


Using  the  posterior  adaptation  property  results  in  the  exp 
sion 


e 


where  again  1  =  0,1,2...N  and  k  =  1,2...Q.  This  result 
shows  that  all  (N  +  1)Q  adapted  template  densities  shown 
Figure  3-2  are  equal  to  the  single  posterior  density  e(x, 
Returning  to  triangle  equality  with  this  result  gives 

H(q,t^"^)  =  H(q,e)  +  H(e.t*"^) 

with  1  =  0,1,2...N  and  k  =  1,2...Q. 

The  cross -entropy  HCq.t^"*^)  is  the  amount  of  inforr.a 
tion  needed  to  determine  the  true  information  cell  density 
q(x,y)  given  the  predefined  template  density  t^"‘^(x,y)  o 
it  is  a  measure  of  how  much  q(x,y)  differs  from  the  temp 


density.  The  triangle  equality  given  above  shows  that  the 
total  ’’distance”  between  q(x,y)  and  the  template  density 
t^'‘\x,y)  or  HCq.t^'***)  is  the  sum  of  two  components  The 
expected  value  matching  property  has  shown  that  the  first 
term  H(q,e)  already  represents  a  minimum  "distance”  between 
q(x,y)  and  the  best  posterior  estimate  of  q(x,y)  with  the 
required  exponential  form.  The  second  variable  term 
H(e,t  )  is  the  "distance”  from  the  template  density 
t^'‘Hx,y)  to  the  minimum  cross-entropy  posterior  density 
e(x,y) . 

The  strategy  for  a  detection  algorithm  is  now  to  use  the 
expected  value  matching  property  of  the  minimum  cross -entropy 
procedure.  Also,  since  H(q,e)  has  previously  been  shown  to 
have  its  minimum  possible  value,  the  detection  rule  must  se¬ 
lect  the  template  density  t^“^^(x,y)  from  the  set  of  (N  +  l)Q 
total  templates  because  it  is  the  density  that  minimizes  the 
cross-entropy  H(e,t^®^b-  The  triangle  equality  for  cross¬ 
entropy  therefore  results  in  a  detection  rule  based  on  the 
"distance"  between  minimum  cross-entropy  template  densities 
tW(x  ,y)  and  the  minimum  cross-entropy  scene  density  e(x,y). 
The  rule  requires  that  we  find  a0  such  that 

H(e,t^“^)  H(e,t^"‘^) 

as  1  varies  from  0  to  N  and  k  varies  from  1  to  Q. 

The  detection  rule  that  results  is  equivalent  to  the  classi- 


ficati  n  rule  given  by  Kullback  (Kullback,  1959533).  Using 
this  detection  rule  results  in  finding  the  template  density 
t<“^(x  ,  y)  that  differs  the  l-^^st  from  the  true  information 
cell  density  q(x,y).  The  numerical  value  of  a  will  then 
indicate  if  the  ith  information  cell  in  the  pth  frame  contains 
only  clutter  or  one  of  the  N  objects  of  interest. 

Now  to  implement  the  detection  rule  numerically  a  modi¬ 
fication  of  a  result  provided  by  Gray  and  Shore  (Gray,  1980) 
for  a  speech  coding  technique  will  be  used.  The  second  result 
from  the  posterior  adaptation  property  stated  that 

HCe^.p)  =  HCe^.ej)  +  HCe^.p)  +  -  m^^ ) 

i-» 

Rearranging  and  changing  to  the  target  density  notation  gives 

H(e,t<"‘^)  =  H(e,p)  -  H(t®'‘\p)  -  ^ -  mj  ) 

Ok) 

where  are  the  Lagrange  multipliers  used  in  the  pre¬ 

defined  template  density  t^"^(x,y).  The  template  densities 
are  also  obtained  through  a  minimum  cross-entropy  procedure 
and  therefore 

HCt^^.p)  =  -x<;'  - 

is  the  cross-entropy  at  the  minimum.  Then  substitution  of 
this  expression  into  the  H(e,t^**^)  expression  gives 


H(  e ,  =  H(  e ,  p)  +  >!"’  + 


The  term  H(e,p)  is  a  constant  for  all  template  densities 
and  will  not  enter  into  the  decision  rule.  The  detection 
rule  can  thus  be  implemented  numerically  as 


Find  such  that 


as  r  varies  from  0  to  N  and  s  varies  from  1  to  Q. 
Defining  (N  +  1)Q  Lagrange  multiplier  vectors  [a„]  tjy 


[Aj= 


and  an  augmented  measurement  vector  as 


[M]  = 


45 


allows  the  detection  algorithm  to  be  compactly  expressed  as 
a  dot  product  operation. 


Find  a0  such  that 


when  compared  to  all  (N  +  1)Q  lambda  vectors. 

The  detection  algorithm  presented  here  is  numerically 
attractive  since  all  (N  +  1)Q  lambda  vectors  for  the  tem¬ 
plate  densities  can  be  precomputed.  The  only  on-line  compu¬ 
tations  required  then  are  the  information  function  expected 
value  measurements  and  (N  +  1)Q  vector  multiplications  or 
dot  products. 


Chapter  IV.  Information  Functions 

Introduction 

The  goal  of  the  minimum  cross -entropy  detection  algorithm 
is  the  identification  of  objects  contained  in  information 
cells  independent  of  their  position  and  orientation  within 
the  cell.  To  meet  this  goal  and  complete  the  detection  rule 
definition,  a  set  of  orthonormal  image  moments  will  be  devel¬ 
oped  and  referenced  to  a  standard  coordinate  system  to  com¬ 
pletely  define  the  information  functions  fj(x,y).  The  num¬ 
ber  and  form  of  the  information  functions  will  then  partly 
determine  the  accuracy  and  resulting  cross-entropy  H(e,t  ) 
distances  between  the  approximate  scene  density  e(x,y)  and 
the  (N  +  1)Q  template  densities  t^**^(x,y). 

Image  Moments 

The  concept  of  moments  is  used  extensively  in  classical 
mechanics  and  statistics.  In  this  dissertation,  the  two- 
dimensional  (r  +  s)th  order  raw  moments  of  the  normalized 
information  cell  irradiance  distribution  q(x,y)  are  defined 
in  terms  of  Riemann  integrals 


u 


rs 


=  J'J'x'  y*q{x,y)dxdy 


call 


The  irradiance  distribution  is  a  bounded  function  that  can 


have  Tic  izero  values  only  in  a  finite  part  of  the  xy  plane. 
Because  of  these  irradiance  distribution  function  character¬ 


istics,  moments  of  all  orders  exist  and  the  double  moment 
sequence  uniquely  determined  by  the  density  q(x,y) 

and  conversely  q(x,y)  is  uniquely  determined  by 
(Hu,  1962). 

The  low-order  moments  can  be  used  to  define  a  standard 
coordinate  system  about  which  the  moment  sequence  will  be 
invariant.  With  the  "target"  as  the  predominant  feature  in 
the  information  cell  where  the  term  "target"  also  models  a 
clutter  configuration  with  the  target  of  interest,  this  stan¬ 
dard  coordinate  system  will  be  invariant  to  changes  in  inten¬ 
sity,  orientation  and  location  of  the  "target,"  The  zero- 
order  moment  is  given  by 
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00 


-If 


q(x,y)dxdy 


cell 


and  represents  the  total  image  power.  The  image  power  is 
normalized  to  one  as  required  of  a  probability  density  and 
this  also  provides  a  standard  density  that  is  Invariant  to 
uniform  intensity  variations.  The  first-order  moments 
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yq(x,y)dxdy 


Mil 


can  be  used  to  define  centre'  moments  which  are  in¬ 

variant  to  translation  of  the  "target"  within  the  information 
cell.  These  first-order  moments  locate  the  centroid  of  the 
image  irradiance  distribution,  i.e.  x  =  Uyj/uQQ,y  = 
and  the  central  moments  are  then  defined  about  the  centroid 
as 


v„  = 


=  // 


(x  -  xf  (y  -  y)*  q(x,y)dxdy 


Mil 


From  the  definition  of  central  moments  it  is  easy  to  express 
the  central  moments  in  terms  of  the  raw  moments.  For  example 
the  first  three  moment  orders  are  related  by 


Zero  Orders 


^00  =  %0 


First  Orders 


V,-  =  u,-  -  5cu^  =  0 

10  10  00 


'il  ^00  ^  ° 


Second  Orders 


Vjo  =  ^20  -  **^00 


v,i  =  u„  -  X^^^ 

^02  =  ’^oa  -  y'^^oo 
^9 
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A  general  formula  for  calculating  the  central  moments  in 


terms  of  the  raw  moments  can  be  found  using  the  binomial 
expansion  (a  +  b)"  =  U¥"  b* .  The  resulting  general  ex¬ 


pression  for  converting  raw  moments  into  central  moments  is 


= 


■SSOXf*’"' 


where  the  notation  denotes  the  usual  binomial  coefficient 

and  equals  a!/b!(a  -  b) ! .  Kanazawa  also  provides  a  FORTRAN 
program  (Center)  to  calculate  two-dimensional  central  moments 
from  a  set  of  two-dimensional  raw  moments  using  an  alternate 
iterative  relationship  (Kanazawa,  1980s 106). 

Using  second  order  moments,  a  second  image  invariant, 
the  angle  of  minimum  moment  of  inertia  0  can  be  used  in 
addition  to  the  center  of  mass  given  by  the  centroids.  The 
quantities  x,y  and  0  together  define  an  invariant  reference 
frame  for  any  information  cell.  In  terms  of  raw  moments  the 
angle  of  minimum  moment  of  inertia  is  defined  by 


0  =  itan* 


2(u__u,.  -  U--  u^.  ) 

'  00  11  10  01 


^^00  ^20  -  -  uj.  ) 


‘10 


00  '‘02 


01 


and  defines  a  region’s  orientation  within  a  two-fold  degen¬ 
eracy.  The  use  of  central  moments  converts  the  image  invari¬ 
ants  into  the  more  intuitive  concept  of  an  invariant  image 
ellipse.  The  second-order  central  moments 


^20 


Jf 


2 

(x  -  x)  q(x,y)dxdy 


c*ll 


Vjj  =  J'J'i  -  3c)  (y  -  y)q(x,y)dxdy 

call 

//<  y  -  y)*  q(x,y)dxdy 


^02 


cell 


characterize  the  size  and  orientation  of  the  image.  Using 
only  central  moments  up  through  second  order,  the  original 
image  can  not  be  discriminated  from  a  constant  irradiance 
ellipse  having  definite  size,  orientation  and  eccentricity 
while  centered  at  the  image  centroid  (Teague,  1980).  The 
semi-major  axis  x'  and  the  semi-minor  axis  y'  of  the  el¬ 
lipse  are  shown  in  Figure  4.1  and  define  the  principal  axes 
of  the  pattern.  Moments  defined  using  the  principal  axes  of 
the  pattern  are  invariant  to  rotation  and  translation.  Us in 
central  moments,  the  angle  of  minim\im  moment  of  inertia  0 
reduces  to  the  angle  0  that  defines  a  rotation  from  the 
original  x  axis  to  the  semi -major  axis  x'  of  the  image 
ellipse.  The  tilt  angle  0  is  defined  by 


• 

■ 

=  it  an'* 

V20  -  V02 

a 

where  -702  s  tan'*(x)  s  tt/Z.  There  is  an  ambiguity  in  the 
tilt  angle  0  which  can  be  resolved  by  selecting  0  as  the 
angle  between  the  x  axis  and  the  semi -major  axis  of  the 
ellipse  or  as  defined  in  Figure  4.1,  having  the  image  para- 


Image  Ellipse 


meter  a  alv.'ays  greater  than  or  equal  to  the  parameter  b. 
The  rotation  to  the  invariant  principal  axes  of  the 


pattern  corresponds  to  the  orthogonal  transformation 


x'  =  xcos^  +  ysinflf 


y*  =  -xsinj2f  +  ycosj2^ 


Using  this  orthogonal  transformation  a  general  expression  for 
invariant  moments  can  be  defined  in  terms  of  a  set  of  central 
moments  [  as 


w. 


r» 


=  J'Jl-x.oQs0  +  ysinjZf)'  (-xsin|2f  +  ycos)2^)*  q(x,y)dxdy 


c«ll 


Again  using  the  binomial  expansion  results  in  the  expression 


call  * 


q(x,y) dxdy 


which  is  equivalent  to 


w, 


ra 


k-O 


.cos0Y‘^***^  (sinJ^)^**‘*Vj._ 


J-i-s-kiJ't’k 


The  general  transformation  expression  shows  that  the  set  of 
central  moments  fv„|'  of  order  N  =  r  +  s  transform  into  the 


set  of  invariant  moments  of  the  same  order  N  =  r  +  s. 

In  summary  then  the  raw  moments  u„  of  order  two  and 
below  have  been  used  to  construct  an  invariant  reference 
frame  called  the  principal  axes  of  the  pattern  which  is  in¬ 
variant  to  uniform  intensity  variations,  translation  and 
rotation  of  the  "target’’  within  the  information  cell.  Gen¬ 
eral  expressions  were  obtained  for  computing  the  invariant 
moments  of  any  order  from  a  set  of  raw  moments  of  the  same 
order.  The  raw  image  moments  are  first  converted  to  central 
moments  and  the  central  moments  are  then  mapped  into  invari¬ 
ant  moments  referenced  to  the  principal  axes  of  the  pattern. 
The  translation  and  rotation  of  the  set  of  raw  moments  will 
be  much  faster  numerically  than  translating  and  rotating  the 
complete  image  before  computing  the  set  of  invariant  moments. 
Figure  4.2  illustrates  the  two  equivalent  methods  of  obtain¬ 
ing  the  desired  set  of  invariant  moments. 

Orthonormal  Moments 

From  functional  analysis  it  is  well-known  that  the  gen¬ 
eral  definition  of  the  moment  operator 


cell 


has  the  form  of  a  projection  of  the  normalized  irradiance 
function  q(x,y)  onto  the  subspace  of  monomials  |^x*  y'‘j' . 

The  Weierstrass  approximation  theorem  shov/s  that  the  monomials 
form  a  complete  basis  set  for  a  series  expansion  of  q(x,y) 


Fig.  4.2.  Alternate  Methods  to  Compute  Invariant  Moments 
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but  th’T  monomials  do  not  form  an  orthogonal  basis  set.  Using 
the  Gram-Schmidt  process  on  the  linearly  independent  set 

produces  a  well-known  orthonormal  set  |o„(x)j- 
over  the  interval  -1  s  x  s  1  that  is  useful  in  constructing 
a  two-dimensional  orthonormal  set  based  on  the  monomials 
(Kreyszig,  1978:176).  The  orthonormal  elements  have  the  gen¬ 
eral  form 


0„{x)  =  ^2n  +  1  P„  (x) 

where  P„ (x)  is  the  Legendre  polynomial  of  order  n.  Using 
the  orthonormal  elements  0  (3)  to  produce  information  func¬ 
tions  as  linear  combinations  of  the  monomials  allows  q(x,y) 
to  simultaneously  have  both  of  the  following  orthogonal 
series  expansions: 

00^  00 

q(x,y)  =  2  Z)^mnOm(x)0„  (y) 

in«0  ii'O 

and 


00 

Z  1^^1:nOm(x)0„(y) 
m-0  n=0 

This  model  of  the  image  density  function  is  an  extension  of 
the  model  developed  by  Neyman  (Neyman,  1937)  and  used  in 
several  articles  by  Crain  (Crain,  1977.  197^.  1976)  dealing 
with  approximating  univariate  probability  densities.  The 
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Neyman  ;;iodel  for  the  unknown  information  cell  density  func¬ 
tion  q(x,y)  corresponds  to  the  minimum  cross-entropy  (maxi¬ 
mum  entropy)  density 


q(x,y)  =  exp 


r  » 

L  i=o 


(x,y) 


with  an  infinite  number  of  information  functions  f. (x,y) 
formed  as  a  product  of  normalized  Legendre  polynomials.  The 
Neyman  infinite  series  expansion  expressed  as  a  product  of 
normalized  Legendre  polynomials  is  basically  a  summation  by 
infinite  rows  of  a  matrix  of  series  terms  and  must  be  approxi 
mated  to  be  of  any  practical  value.  The  approximation  is 
obtained  by  summing  along  finite  diagonals  and  truncating  at 
a  finite  order  to  obtain  the  expressions: 


q(x,y) 


J=0  n»0 


and 


q(x.y)  ~ 

L  1=0  n-O  J 

This  result  is  used  by  Teague  (Teague,  1980)  and  is  the  basic 
equation  required  to  approximate  the  unknown  information  cell 
density  q(x,y).  The  approximation  for  the  information  cell 
density  also  corresponds  to  a  minimum  cross -entropy  (maximum 
entropy)  density  with  (N^,  +  1)(N„„  +  2)/2  information 


functicns  fj  (x,y)  formed  as  products  of  the  normalized 
Legendre  polynomials.  Due  to  the  rotational  properties  of 
moments ,  all  moments  of  a  given  order  must  he  included  in  the 
series  expansion  and  treated  as  having  equal  importance  when 
constructing  an  approximate  density  function.  The  number  of 
information  functions  required  in  an  approximate  density  func¬ 
tion  of  order  is  shown  in  Figure  4.3  with  each  diago¬ 

nal  line  in  the  figure  corresponding  to  a  different  moment 
order  beginning  with  one  function  and  zero  order.  The  accu¬ 
racy  of  the  truncated  approximation  for  q(x,y)  improves  as 
Ninax  is  increased,  however,  the  numerical  difficulties  of 
solving  a  large  system  of  nonlinear  equations  for  [A]  also 
increases  as  the  size  of  the  corresponding  lambda  vector 
grows  with  .  The  selection  of  the  moment  order 

is  thus  a  compromise  between  accuracy  as  measured  by  cross - 
entropy  H(q,t^'^*^)  and  the  numerical  processing  time  required 
to  compute  the  set  of  lambda  vectors  [A^^]  and  the  augmented 
measurement  vectors  [m]  . 

The  problem  is  then  to  select  for  the  set  of  tem¬ 

plate  densities  t^'’*Hx,y) .  The  template  density  representa¬ 
tion  must  be  accurate  enough  to  provide  a  small  probability 
of  error  for  the  detection  algorithm  and  also  not  require  an 
excessive  amount  of  processing  time.  Cross-entropy  HCqjt^*^*^) 
serves  as  an  information  theoretic  distance  between  the  true 
density  q(x,y)  and  the  model  density  t^''*^(x,y).  Because 
H(q,t^'’*^)  is  nonnegative  and  HCq.t^^b  =  0  if  and  only  if 
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Pig.  4.3«  Information  Function  Construction 


q(x,y)  =  , y)  almost  everywhere,  cross -entropy  hetween 

the  model  and  the  true  density  can  be  used  to  select 
the  required  model  order.  The  cross-entropy  distance  is 
given  by 


H(q,t<"*>)  =  yy'q(x,y)ln  dxdy 


Substituting  the  Neyman  and  minimum  cross-entropy  density 
forms  gives  the  expression 


H(q,t<”))  =  f  /’q(x.y)ln  -g^g.f|(x,y)  ^xdy 

exp  -j^“nfn(x,y) 


Rearranging  and  taking  expected  values  results  in 


HCq.t^'"*’) 


dxdy 


which  is  known  to  approach  zero  as  t  approaches  infinity. 
With  the  infinite  sum  in  the  last  expression  converging  to 
some  function  3(x,y)  the  cross-entropy  distance  measure 
will  take  the  form 


H(q,t<™>)  =  t  (o,  -  )3,)m,  +  C(t) 
i-o 

where  C(t)  is  a  constant  for  each  value  of  t  in  the  tern 
plate  density  expression.  Since  this  expression  for  cross- 
entropy  can  not  be  evaluated  analytically,  in  Figure  6.1 
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H(q,t  )  is  numerically  evaluated  for  a  block  tank  template 
and  plotted  as  a  function  of  t,  the  number  of  information 
functions  used  in  the  template.  In  general,  for  the  minimum 
cross-entropy  detection  algorithm  to  be  effective  the  sum  of 
the  expected  values  of  the  high  order  terms  or  the  constant 
C(t)  must  convey  a  small  amount  of  information.  The  infor¬ 
mation  content  of  the  high  order  terms  in  turn  depends  on 
the  number  and  magnitude  of  the  variations  in  the  density 
that  is  being  approximated  by  the  template.  Because  of  this 
target  dependence,  H(q,t^'’*^)  is  evaluated  numerically  in 
Chapter  VI  using  a  template  with  many  abrupt  and  relatively 
large  changes  in  the  density  function  to  provide  an  approxi¬ 
mate  worst  case  relationship  between  the  number  of  informa¬ 
tion  functions  and  the  resulting  cross-entropy  H(q,t^''**). 

Legendre  Polynomials 

To  complete  the  information  function  definition,  we  need 
explicit  expressions  for  the  Legendre  polynomials  that  are 
used  to  form  f„(x,y).  These  orthogonal  polynomials  are  de¬ 
fined  over  the  interval  [-1,1]  and  have  the  general  explic¬ 
it  expression 


P  (x)  =  1 


^  2m)U. 


where  [n/2]  denotes  the  greatest  integer  not  exceeding  n/2. 
Legendre  polynomials  also  satisfy  the  following  recurrence 
relation  (Courant,  1953* 86) j 


6 


Table  I  explicitly  defines  the  first  thirteen  Legendre  poly¬ 


nomials  and  the  required  normalization  factor  \/(2n  +l)/2. 
The  Legendre  polynomials  shown  can  be  used  to  define  the 
91  information  functions  required  by  Figure  4.3  for  a  twelth 
order  approximation  of  the  true  density  function.  Figure  4. 
also  shows  how  all  91  information  functions  are  constructed 
from  the  normalized  Legendre  polynomials.  For  example,  the 
44th  information  function  shown  as  number  43  on  the  figure 
is  given  by 

^43  (x,y)  =  Oj(x)  *0^  (y) 

and  using  the  expressions  given  in  Table  I  becomes 
f43(x.y)  =  (1.22x)(2.74)(26.8ly^-  43.31x*+  19.69y’-  2.19y) 
or 

f43(x,y)  =  89.31xy^-  l44.66xy®+  65.76xy’-  7-31xy 

The  expected  value  of  this  information  function  can  be 
written  in  terms  of  invariant  moments  as 

89.31Wjy  “  144.66wj5  +  65»76v^3  -  7-31Wji 
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Table  I .  Legendre  Polynomials 


Symbol 

IZn+l 

V  2 

Explicit  Expression 

Po(x) 

0.707 

1 

Pj  (x) 

1.P2 

X 

Pj(x) 

1.58 

1.5x*-0.5 

P,(x) 

1.87 

2.5x*-1.5x 

P^  (x) 

2.12 

4.38X*  -3.75x*+0.38 

Pg  (x) 

2.35 

7.88x*-8.75x*+1.88x 

Pg(x) 

2.55 

l4.44x«  -19.69x*+6.56x*-0.31 

P^(x) 

2.74 

26.81x^  -43.31x®+19.69x®-2.19x 

Pg  (x) 

2.92 

50 . 27x*  -93 . 85x*+54.  153!!*  -9 . 85x*+0 . 27 

P,(x) 

3. 08 

P,o(x) 

3.24 

l80.42x^®-427. 31x*  +351 . 90x®  -117 . 31x* 

+13.53x*-0.24 

P„(x) 

3.39 

344.42x“-902.05x*+854.57x^  -351. 88x® 

+  ^8. 64x®-2. 60x 

PjjCx) 

3.54 

660 . 25x"-1894. 68x*®+2030 . 05x* 
-997.24x*+219.98x®-17.57x*+0.22 

The  me  -surement  vector  [M]  required  in  the  detection  al¬ 
gorithm  is  then  completely  defined  by  the  set  of  invariant 
moments 

In  summary,  a  set  of  raw  moments  produced  and 

converted  into  central  moments  {^mn}  about  the  pattern  cen¬ 
troid.  The  central  moments  are  then  rotated  and  become  the 
set  of  invariant  moments  {^nin|  about  the  principal  axes  of 
the  pattern.  The  set  of  orthonormal  moments  |m,|  that  form 
the  measurement  vector  [m]  are  then  computed  as  linear  com¬ 
binations  of  these  invariant  moments.  Given  these  defini¬ 
tions,  the  detection  algorithm  is  ready  to  process  informa¬ 
tion  cells. 
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Chapter  V.  Numerical  Techniques  and  Performance  Analysis 


The  key  to  implementing  the  minimum  cross-entropy  de¬ 
tection  algorithm  is  the  ability  to  find  the  correct  Lagrange 
multiplier  vector  for  a  given  set  of  measurements.  The  prob¬ 
lem  mathematically  is  to  find  A  =  .  .k^'^  such  that 


FoCA)" 

W  '  A  t 

■  M 

J'Jeix,y)dxdy  -  1 

m  m 

0 

[f(a)]  = 

Fj  (A) 

• 

« 

= 

yy^i  (x,y)e(x,y)dxdy  -  m^ 

•  • 

•  » 

= 

0 

• 

t 

• 

Fj  (A) 

•  m 

•  • 

J‘Jfti^>y)e{x,y)dxdy  -  m^ 

• 

0 

where  e(x,y)  =  p(x,y)exp|-XQ  -Xjfj  (x,y)  -  . .  .  -  A^f^  (x,y)| 
is  the  minimum  cross-entropy  density  with  a  uniform  prior 
density.  The  (t  +  1)  constraints  are  nonlinear  and  except 
for  a  few  restricted  cases  cannot  be  solved  directly  for  the 
lambda  vector.  Several  authors  discuss  iterative  numerical 
schemes  for  simultaneous  solution  of  a  system  of  nonlinear 
equations.  Johnson  (Johnson,  1979bi24)  provides  a  computer 
program  written  in  APL  for  solving  discrete  cross-entropy 
minimization  problems  with  arbitrary  positive  priors  that  is 
based  on  the  Newton-Raphson  method.  Gokhale  and  Kullback 


h  '  « 


(Gokha.'-6,  1978)  describe  a  somewhat  different  algorithm  also 
based  on  the  Newton-Raphson  method  that  has  been  implemented 
in  PL/1.  Agmon,  Alhassid  and  Levine  (Agmon,  19798250)  des¬ 
cribe  yet  another  discrete  cross-entropy  minimization  algori¬ 
thm  using  a  uniform  prior  and  a  FORTRAN  implementation. 

Miller  also  provides  an  alternate  FORTRAN  implementation  in 
one-dimension  that  is  based  on  the  Newton-Raphson  method 
(Miller,  1980:45). 

The  Newton  method  is  an  iterative  scheme  based  on  the 
relationship: 

[AA]  =  [a'">]  -  [a^**’]  =  [j]'‘  [f(aW)] 

where  [a^"^]  is  the  Lagrange  multiplier  vector  lambda  for 
the  nth  iteration  and  [j]  is  the  Jacobian  matrix  for 
[f(A^"^  )]  .  The  initial  estimate  is  selected  and  the 

equation  solved  for  [a<-)]  .  The  procedure  repeats  for  [a^*!] 
[a^*^]  .  •  •  [a^"^]  I  [a^"  until  the  difference  [aa]  is  less 
than  a  predefined  value  which  insures  convergence  has  occured. 
The  equation  that  must  be  solved  numerically  to  implement  the 
Newton  method  requires  an  evaluation  of  the  Jacobian  matrix 
[J]  during  every  iteration  for  a  new  lambda  vector.  The 
Jacobian  matrix  has  terms  of  the  form  SF,  (A)/dXj  and  [j] 
is  then  a  (t  +  1)  x  (t  +  1)  symmetric  matrix.  Convergence 
and  rate  of  convergence  of  the  Newton  algorithm  are  dependent 
on  the  initial  estimate  .  Many  authors  address  the 
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theoretical  convergence  criteria  of  the  'lev, 'ton  method  (see 
(Ortega,  1970)  and  (Saaty,  1964))  which  in  general  define  a 
neighborhood  about  the  solu\ ' ^n  vector  where  convergence  is 
assured  if  the  initial  estimate  falls  in  this  neighborhood. 

The  Newton  method  has  thus  been  used  almost  exclusively 
in  solution  schemes  appearing  in  the  technical  literature. 
Past  applications,  however,  worked  with  a  small  number  of 
constraints  and  have  encountered  problems  with  ill-condition¬ 
ing  in  the  computer  generated  Jacobian  matrix  and  selection 
of  an  appropriate  initial  estimate  of  the  lambda  vector 
(Miller,  1980:50).  Using  91  constraints  accentuates  these 
numerical  problems  to  the  point  that  the  second-order  Newton 
method  (Bodes,  1978)  must  be  abandoned  for  lower  order  meth¬ 
ods  that  do  not  use  derivative  information. 

Lambda  Vector  Solution 

The  processing  required  to  find  the  lambda  vector  is  the 
main  burden  of  the  minimum  cross-entropy  detection  algorithm. 
The  lambda  vector  for  each  template  density  is,  however,  pre¬ 
computed  and  stored  for  use  in  the  detection  algorithm.  A 
zero  order  method  was  selected  to  solve  for  the  lambda  vector 
since  the  procedure  must  only  be  accomplished  once  for  each 
template  density  and  most  numerical  problems  are  avoided. 

The  Cyclic  Coordinate  Method  (Bazaraa,  19795271)  is  a  multi¬ 
dimensional  search  procedure  that  does  not  use  derivatives. 
The  only  required  information  is  that  A  €  L  where  L  has 
the  form  L  =  |a  :  a|  s  X|  «  bj.  The  search  procedure  requires 
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only  a  iefined  search  interval  and  has  been  implemented  on 
the  Data  General  Eclipse  S/250  Integral  Array  Processor. 

When  performed  on  the  array  processor,  the  cyclic  coordinate 
search  method  produces  results  faster  than  a  comparably  di¬ 
mensioned  Newton  algorithm  written  in  FORTRAN.  The  search 
problem  is  then  given  the  vector  of  constraint  relationships 


(^)  =  (x.y)p(x,y)exp[-  ^Xufu(x,y) 

cell 


dxdy  -  m,  =  0 


(  I  =  1 .  .  .  t) 


find  the  lambda  vector  required  to  define  e(x,y).  The  dis¬ 
crete  approximation  of  this  equation  can  then  be  written  in 
the  form 


m, 


p(x, 


.yjexp 


-  't.Ki. 

.  u»0 


(x,  ,y„) 


0 


(  I  =  1. . .t) 


Using  a  uniform  prior  density  and  then  canceling  terms  re¬ 
sults  in  the  equivalent  expression: 
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The  t  -  91  equations  defining  the  Lagrar\ge  parameter  vector 
are  implicit  and  nonlinear.  The  direct  numerical  solu¬ 
tion  of  this  vector  constraint  equation  would  he  computa- 
tionaly  cumbersome.  However,  using  a  result  derived  by 
Agmon  et  al  (Agmon,  1979).  this  problem  can  be  recast  as  a 
sim.pl er  variational  problem.  The  technique  requires  that  a 
"potential"  function  which  is  concave  for  any  trial  set  of 
Lagrange  parameters  be  defined.  The  values  of  the  Lagrange 
multiplier  parameters  can  then  be  determined  as  the  set  which 
minimizes  the  potential.  Agmon  et  al  provide  the  following 
lemma  which  has  direct  application  to  the  nonlinear  constraint 
equation  given  above: 


« 


« 


Let  flcR*  be  a  simply  connected  domain.  Let 
be  a  continuously  differentiable  vector  function.  Denote  its 
Jacobian  by  J,  that  is  J,|  =  aFj  /5Xj  and  suppose  it  is  a 

symmetric  positive  definite  matrix.  The  problem  of  solving 
the  set  of  nonlinear  equations  F(-/v.)  =0  is  equivalent  to 
finding  a  minimum  of  a  concave  scalar  potential  function 

The  solution  of  the  system  of  nonlineair  equations 
F(jv.)  =0  is  then  found  to  be  equivalent  to  minimizing  the 
following  scalar  potential  function 

"  L  ’ym)  - 
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The  solution  program  Lambda  assumes  L  =  |a.:  -25^Aj^25|' 
and  has  always  converged  to  a  minimum  for  ^(j\.)  .  Since  the 
potential  function  is  conca’''^.  should  a  component  of  the  solu¬ 
tion  vector  lie  outside  this  interval  the  algorithm  will  se¬ 
lect  a  value  of  ±25  and  signal  that  the  search  interval 
should  be  expanded.  The  functional  minimization  program  uses 
a  32  X  32  grid  to  compute  and  a  sequence  of  decreas¬ 

ing  search  intervals  to  reach  the  solution  vector. 

The  final  component  of  the  solution  vector  is  found 

from  the  requirement  that  the  resulting  minimum  cross -entropy 
density  e(x,y)  integrate  to  one.  The  result  then 


To  implement  the  functional  minimization  scheme  and  produce 
image  moments  requires  an  effective  quadrature  scheme.  The 
quadrature  algorithm  used  in  all  programs  will  be  developed 
in  the  next  section. 


Numerical  Quadrature 

The  first  Newton-Cotes  formula  known  as  the  Trapezoidal 
Rule  gives  a  relationship  which  forms  the  basis  of  an  effec¬ 
tive  quadrature  scheme.  In  one  dimension  with  the  interval 
of  integration  divided  in  n  parts  the  Trapezoidal  Rule 
states  that  (Young,  1972:371)! 


B 
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f(x)dx  =  h 


n-l 


k=l 
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where  =  f(a  +  kh)  and  h  =  (b  -  a)/n.  The  first  process 
ing  requirement  for  this  quadrature  scheme  is  the  production 
of  a  set  of  raw  moments  |upq|  from  the  unknown  image  den¬ 
sity  function  q(x,y)  or  template  density  t(x,y).  Raw  mo¬ 
ments  are  then  transformed  into  orthonormal  Legendre  moments 
[M]  for  use  in  the  detection  algorithm  or  the  iterative  cy¬ 
clic  coordinate  method. 

The  two-dimensional  raw  moment  Up^  is  defined  in  Car¬ 
tesian  coordinates  as 


-C  V-c 


y‘’q(x,y)dy  dx 


which  can  be  rewritten  as 


c 

u„  =  y*x'*dq(x)dx 


where 


do(x) 


C 

=  y*  y‘’q(x, 


y)dy 


The  one  dimensional  Newton-Cotes  formula  will  be  used  to 
approximate  dq(x).  Using  only  the  first  term  of  the  Trape¬ 
zoidal  Rule  gives 

d,(Xj)  «  hri/y;q(Xi  ,yj  +  y^qCx^.yj)  +  ^yJqCx,  ,y,  )] 


The  ap_.'roximatlon  can  now  be  applied  again  to  produce  the 
raw  moments  Up^  .  The  raw  moments  are  then: 

Upq  »  hFiCxJd^Cxp)  +  x^d^CxJ  +  £xjd(xj 

L  k-i 

Completely  expanding  this  approximation  gives  a  clearer  pic¬ 
ture  of  the  factors  involved  in  Up^  .  The  expanded  approxi¬ 
mation  for  the  raw  mements  Up^  is: 

2  r 

'^0  ^  yj )  +  xPy;jq(x„,y„  ) 

^  xP^y;q(x„,yJ  +  yj  ^x|*q(x,  .y^^ ) 

I-  pT  k.i  1=1 

+  h’£gxjyjq(x,,y,) 

m=1  -*  k=1  jf1 

The  first  grouping  of  terms  in  the  expression  represents  the 
contribution  from  the  four  comer  points  of  the  sampled  un¬ 
known  (or  template)  density  array  q(Xj,yj)(or  t(x,  .y^  )  ). 

The  second  grouping  of  terms  represents  the  contribution  from 
the  remaining  "edge”  sample  points.  The  last  term  then  re¬ 
presents  all  the  interior  sample  points  and  when  using  a 
256  X  256  sampled  density  array  represents  98.449S  of  the  pos¬ 
sible  contribution  to  the  moments.  When  a  white  border  is 
used  with  the  density  matrix  the  "edge"  terms  will  make  no 
contribution  at  all  to  the  moment  approximation  and  can  be 
ignored.  With  these  insights  the  raw  moment  approximation 


used  1..  this  dissertation  will  be 

k*0  j=0 

For  computer  computation  using  this  approximation  all 
moments  through  order  L  can  be  computed  with  two  matrix 
multiplications.  The  matrix  equation  for  the  raw  moment 
matrix  is: 


^00  ^01  ^02  • * •  ^OL 
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With  the  Cartesian  coordinate  system  origin  centered  in  the 
bordered  image  density  function  and  the  image  normalized  to 
-1  s  x,y  s  +1  the  subroutine  moment  in  Figure  5*1  computes 
the  normalized  raw  moment  array. 


Detection  Algorithm  Infrastructure 


The  basic  software  modules  required  in  the  minimum  cross¬ 
entropy  detection  algorithm  are  shown  in  Figure  5*1*  The  dia¬ 
gram  shows  both  on-line  (solid  line)  and  pre-processing  (bro¬ 
ken  line)  software  modules  with  the  interdependence  between 
the  two  types  of  processing.  When  using  the  detection  algo¬ 
rithm  to  process  information  cells,  the  (N  +  l)Q  lambda 
vectors  •  •  •  A*"**  )  will  be  stored  as  constants 

for  use  in  the  detection  program. 

The  pre-processing  starts  with  a  set  of  (N  +  1)(Q)  tem¬ 
plate  densities  used  to  ’’train"  the  detection  algorithm. 

These  template  densities  represent  pure  clutter  and  N  tar¬ 
gets  of  current  interest  all  superimposed  on  the  Q  clutter 
backgrounds.  The  analog  to  digital  program  (A  -  D)  produces 
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Fig.  $.1.  Software  Module  Interdependence 


♦ 


(N  +  I'Q  arrays  of  sample  values  each  of  dimension  256  x  256 
The  sample  values  within  the  arrays  have  also  been  quantized 
to  l6  levels.  Each  array  then  contains  65t536  integer  values 
in  the  range  0-15 .  The  program  moment  produces  a  set  of  91 
raw  moments  for  each  template  array.  The  programs 

center,  rotate  and  Legendre  map  the  moments  for  each  template 
into  Legendre  moments  about  the  principal  axis.  The  final 
preprocessing  program  lambda  iteratively  solves  for  91  A.'s 
required  to  define  the  minimum  cross-entropy  density  for  each 
template  and  also  used  in  the  detection  algorithm. 

The  on-line  processing  for  each  unknown  information  cell 
simply  produces  a  set  of  91  Legendre  moments  [m]  using  the 
programs  outlined  in  the  preprocessing  section.  With  the 
Legendre  moment  vector  and  the  (N  +  1)Q  lambda  vectors  the 
detection  program  produces  (N  +  1)Q  dot  products.  The 
matrix  of  dot  product  values  is  searched  for  the  smallest 
element  and  the  row  number  of  that  element  determines  the 
classification  decision  for  that  information  cell.  The  on¬ 
line  processing  is  then  repeated  for  each  new  information  cell 
The  lambda  values,  however,  remain  fixed  for  all  information 
cells  presented  to  the  detection  algorithm  and  thus  prepro¬ 
cessing  is  performed  only  once  for  a  given  set  of  template 
densities. 

Performance  Anal  vs is 


Given  the  theory  and  now  the  software  for  a  minimum 
cross-entropy  detection  algorithm,  an  expected  performance 
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analysis  will  complete  the  presentation  of  this  detection 
rule.  The  next  chapter  will  then  explore  actual  performance 
on  a  set  of  test  scenes  and  verify  the  impact  of  the  salient 
factors  influencing  probability  of  error.  The  analysis  in 
this  section  will  attempt  to  use  concepts  from  statistical 
communication  theory  to  identify  the  major  factors  determin¬ 
ing  P(€)  in  the  minimum  cross-entropy  algorithm.  Statis- 
ticail  communication  theory  has  as  its  goal  the  detection  or 
estimation  of  signals  in  the  presence  of  noise,  but  because 
of  the  difficulty  of  establishing  useful  statistical  assump¬ 
tions  (Duda,  1973*324)  it  has  found  few  applications  in  scene 
analysis. 

With  statistical  communication  theory  techniques  in  mind 
the  information  cell  density  can  be  simply  modeled  as  the  sum 
of  two  terms* 

q(x,y)  =  s(x,y)  +  n(x,y) 

The  term  s(x,y)  represents  the  expected  signal  or  the  tem¬ 
plate  density  used  to  train  the  detection  algorithm,  i.e.  the 
densities  that  correspond  to  the  stored  lambda  vectors.  The 
other  component  n(x,y)  represents  the  clutter  that  was  not 
expected  nor  modeled  by  the  template  densities  and  acts  like 
a  noise  term  to  the  detection  rule.  Looking  at  the  binary 
decision  case  for  simplicity,  each  of  the  two  training  den¬ 
sities  has  a  set  of  moments  and  a  corresponding  precomputed 


lambda  /ector.  Then  associated  with  the  target  template  are 
and  [Ay]  while  the  clutter  template  has  associated 
vectors  [M^.]  and  [-A-J. 

Using  the  minimum  cross-entropy  decision  rnile  will  first 
require  computing  a  set  of  moments  over  the  information  cell 


yyq(x,y)f,  (x,y)dxdy  =  m, 


(i  =  0. . . t) 


Substituting  in  the  additive  noise  model  gives  the  expression 


yys(x,y)f ,  (x,y)dxdy  +  J'Jnix,y)f^{x,y)dxdy  =  m. 


(i  =  0. . .t) 


/ritten  as  a  vector  this  expression  becomes 


[Msl  *  [Mj  =  [MJ 


The  first  moment  vector  [Mg]  represents  one  of  the  two  ex¬ 
pected  signals,  i.e.  target  or  clutter.  The  second  term 

is  a  noise  perturbation  vector  caused  by  the  unexpected 
clutter  in  the  scene.  The  minimum  cross-entropy  detection 
algorithm  templates  are  produced  using  a  uniform  prior  den- 


sity  a- i  thus  have  maximum  entropy  consistent  v/ith  the  moment 
constraints.  Entropy  has  been  related  to  scene  structure 
(Watanabe,  1981)  where  structure  refers  to  the  confinement 
of  scene  energy  to  a  small  number  of  pixels  and  large  tran¬ 
sitions  in  pixel  energy.  The  smaller  the  scene  entropy  the 
larger  the  scene  structuredness.  The  minimum  cross-entropy 
templates  thus  have  as  little  structure  as  possible  and  still 
conform  to  the  moment  constraints.  The  normalized  scene  en¬ 
ergy  is  smoothly  spread  over  as  many  pixels  as  allowed  by  the 
moment  constraints  to  produce  the  template  densities.  The 
minimum  cross-entropy  decision  rule  is  thus  inherently  robust 
(Rey,  1978)  to  small  moment  perturbations  which  correspond  to 
changes  in  the  assumed  underlying  density.  The  changes  in 
the  underlying  distribution  have  minimal  impact  on  the  detec¬ 
tion  algorithm  since  small  perturbations  are  smoothed  away  in 
the  process  of  constructing  the  maximum  entropy  templates. 

The  minimum  cross-entropy  decision  rule  then  takes  the 
perturbed  information  cell  moment  vector  and  performs  a  dot 
product  operation  with  each  of  the  stored  template  lambda 
vectors  to  produce 


c 

where  corresponds  to  selecting  the  target  hypothesis  and 
Hg  corresponds  to  selecting  the  clutter  hypothesis.  Expand¬ 
ing  the  moment  vector  into  its  components  gives 


Given  that  the  signal  term  corresponds  to  a  target,  the  first 
term  above  will  be  a  relatively  large  constant  (K)  while 
the  third  term  will  be  a  small  constant  (k).  The  results 
occur  since  both  terms  represent  a  cross-entropy  that  has 
been  shown  to  be  positive  in  all  cases  and  very  small  for 
corresponding  moment  and  lambda  vectors.  Therefore,  the  de¬ 
cision  rule  becomes 


(K  -  k)  I 


The  terms  on  the  right  can  be  viewed  as  a  particular  realiza¬ 
tion  of  clutter  from  a  large  ensemble  of  possible  clutter 
configurations  and  are  each  thus  realizations  of  random 
variables.  Then  the  decision  rule  can  be  written  as 


D  =  (K  -  k)  ^  r-  - 
"c  ^ 


-  r 


Since  the  random  variable  r  is  formed  as  the  difference  of 
similar  random  variables,  r  should  have  approximately  zero 
mean  and  some  variance  <r*  .  Type  II  errors  are  then  made 
when  r  exceeds  D  and  hypothesis  is  declared  to  be 

true.  Using  Tchebycheff 's  Inequality  gives  an  immediate  prob 
ability  of  error  expression  for  equally  likely  hypothesis  as 


P(€)  =  P(  |r|>D)  s  ^ 
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Becaus  the  random  variables  r^  and  r^.  are  formed  as  a 
sum  of  approximately  independent  random  variables  as 


r^  = 


\  ^1 


^2 


m, 


(N) 


i(T)  (N) 

X,  m, 


and 


.(C)  )N) 

=  X,  m, 


+  X 


(C)  (N) 


m. 


+  X’ 


(C)  (N) 


m. 


the  central -limit  theorem  (Papoulis,  1965:266)  applies  and 
r  will  approach  a  Gaussian  density  as  t  becomes  large. 
The  probability  of  error  can  then  be  expressed  as 


P(€)  =  I  -  erf(n_) 


where 


erf(x)  =  -l- 
/27r 


ye-"\u 


Now  given  the  ratio  D/^  a  much  better  estimate  of  P(€)  can 
be  achieved  with  this  approximation  than  the  upper  bound  pro¬ 
vided  by  Tchebycheff 's  Inequality.  Figure  5-2  shows  how 
P(€)  varies  with  B/^  . 

Both  approaches  to  P(€)  have  shown  a  dependence  on  the 
ratio  <f/D  in  predicting  the  minimum  cross-entropy  detection 
algorithm  expected  performance.  The  requirements  for  high 
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perfor  ance  are  to  keep  the  clutter  standard  deviation  sigma 
as  small  as  possible  and  generate  a  large  correlation  dif¬ 
ference  D.  The  factors  which  combine  to  determine  detection 
performance  are  now  seen  to  be  the  number  of  moments  used  to 
characterize  the  target,  the  relative  target  to  scene  area 
and  the  amount  of  clutter  modeled  with  the  target  in  the  tem¬ 
plate  density.  Increasing  the  number  of  information  func¬ 
tions  and  thus  the  number  of  moments  used  to  characterize  the 
target  will  increase  D  since  the  k  term  will  approach 
zero  as  more  functions  are  used  in  the  minimum  cross-entropy 
template.  The  variance  sigma  squared  can  be  decreased  by  re¬ 
ducing  the  scene/target  ratio.  The  increased  relative  target 
size  will  then  allow  smaller  variation  in  the  clutter  field 
since  more  of  the  scene  will  be  represented  by  the  s(x,y) 
term  of  the  additive  model.  The  variance  can  also  be  de¬ 
creased  by  having  more  training  densities  with  probable  clut¬ 
ter  configurations  built  into  the  target  model.  Again  the 
signal  term  will  account  for  more  of  the  clutter  and  reduce 
the  possible  variance  associated  with  the  noise  term  n(x,y) 
in  the  additive  model. 

The  clutter  variance,  however,  can  not  be  measured  since 
the  ensemble  of  clutter  fields  is  not  known  and  tlierefore, 
this  is  not  a  practical  method  of  projecting  the  detection 
algorithm  performance.  Cross-entropy  H(q,t)  defined  as 


H(q.t)  = 


// 

call 


q(x,y)ln 


~q(x,y) 
t( x.y) 


dxd; 
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provid  s  an  alternate  method  of  predicting  the  detection  rule 
performance.  The  cross-entropy  H(q,t)  =  H(q,p)  -  H(t,p) 
serves  as  a  measure  of  the  distance  between  the  scene  density 
q(x,y)  and  the  template  density  t(x,y).  Minimizing  H(q,t) 
has  the  effect  of  minimizing  the  number  of  pixel  differences 
between  the  training  template  and  the  scene  density  and  there¬ 
fore  limits  the  magnitude  of  the  clutter  variance  sigma 
squared.  Minimizing  the  template  to  scene  cross-entropy 
H(q,t)  is  thus  equivalent  to  minimizing  the  clutter  variance. 
The  use  of  H(q,t)  to  minimize  the  number  of  pixel  differ¬ 
ences  is  analogous  to  the  procedure  employeed  by  Watanabe 
(Watanabe,  1965)  in  showing  that  the  Karhunen-Loeve  expansion 
of  the  scene  density  minimizes  the  entropy  of  the  squared 
transform  coefficients  over  the  ensemble  of  possible  orthogo¬ 
nal  coordinate  systems.  The  analogous  result  is  that  the 
Karhunen-Loeve  coordinate  system  minimizes  the  number  of  terms 
required  to  represent  the  image  density. 

The  next  chapter  uses  H(q,t)  to  tie  together  the  ef¬ 
fects  of  increasing  the  number  of  information  functions,  vary¬ 
ing  the  scene/target  ratio  and  increasing  the  number  of  tar¬ 
get  templates.  In  this  work  the  triangle  equality  H(q,t) 

=  H(q,p)  -  H(t,p)  is  used  to  indirectly  evaluate  this  tem¬ 
plate  to  scene  metric.  To  use  the  triangle  equality  it  has 
been  assumed  that  the  scene  density  is  well  represented  with 
a  finite  number  of  information  functions.  The  final  step  in 
evaluating  the  detection  algorithm  then  relates  the  cross- 


entropy  metric  H(q,t)  to  the  expected  probability  of  error. 


Chapter  VI.  Processing  Results 


The  performance  factors  identified  in  the  last  chapter 
that  collectively  determine  the  target  detecting  ability  of 
the  minimum  cross -entropy  detection  algorithm  will  be  ex¬ 
plored  in  this  chapter.  The  basis  of  this  work  is  the  tem¬ 
plate  to  scene  distance  H(q,t)  and  the  triangle  equality 

H(q,t)  =  H(q,p)  -  H(t,p) 

The  performance  factors  will  all  be  related  to  the  cross¬ 
entropy  H(q,t)  and  then  using  a  set  of  test  pictures  the 
relationship  between  cross-entropy  and  probability  of  error 
is  estimated.  The  results  presented  here  then  tie  together 
the  factors  determining  the  error  probability  and  allow  a 
user  to  select  an  operating  point  and  then  project  a  probable 
performance  or  conversely  select  a  required  error  probability 
and  know  the  constraints  imposed  on  the  detection  algorithm. 

Performance  Factors 

The  number  of  information  functions  used  in  the  minimum 
cross-entropy  templates  determines  H(t,p)  and  by  using  the 
triangle  equality  also  H(q,t).  Any  scene  can  be  represented 
exactly  with  a  density  of  the  form  exp(-  where 

|f|,(x,y)|  defines  a  complete  orthogonal  set  of  functions  in 
R  .  Thus  as  information  functions  are  added  to  the  template 


density  the  prior  to  template  cross -entropy  H(t,p)  will 
increase  resulting  in  the  template  to  scene  distance  H(q,t) 
decreasing  and  eventually  approaching  zero.  This  behavior 
has  been  verified  using  the  block  tank  shown  in  Figure  6.4 
and  the  basic  definition  of  cross-entropy.  The  block  tank 
has  a  relatively  high  cross-entropy  value  of  H(q,p)  =  1.965 
due  to  its  structure  or  confinement  of  scene  energy  and 
abrupt  changes  in  density.  Also  the  block  tank  density  has 
no  camera  noise  superimposed  on  the  scene  that  blurs  the 
picture  and  reduces  template  entropy.  The  minimum  cross - 
entropy  approximation  to  the  block  tank  was  computed  using 
second  through  twelfth  order  moment  information  and  the  re¬ 
sulting  template  cross-entropy  H(t,p)  computed.  Figure  6.1 
shows  the  template  to  scene  distance  as  a  function  of  the 
number  of  information  functions  used  in  the  template.  Note 
that  the  resulting  data  points  can  be  approximated  with  a 
straight  line. 

The  minimum  cross-entropy  template  densities  produced 
for  second  through  twelfth  order  moment  information  are  given 
in  Figures  6.5  throu^  6.15  respectively.  The  template  den¬ 
sities  are  shown  to  have  increasing  structure  amd  cross¬ 
entropy  H(t,p)  values  as  the  number  of  moment  constraints 
is  increased.  The  block  tank  was  used  to  clearly  show  the 
impact  on  the  resulting  template  of  the  minimum  cross -entropy 
requirement  and  the  conflicting  requirement  to  conform  to  the 
moment  information. 
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Template  to  Scene  Cross-Entropy  versus  Approximation  Order 


T:.e  scene  cross-entropy  H(q,p)  measured  using  a  uni¬ 
form  prior  density  has  a  finite  maximum  value  over  the  unit 
square.  Using  the  clutter  background  that  produces  this 
meucimum  value  for  scene  cross -entropy  allows  a  worst  case 
evaluation  of  the  relationship  betv^een  H(q,p)  and  the  scene/ 
target  ratio.  The  maximum  value  of  scene  cross-entropy  was 
experimentally  found  to  be  H(q,p)  =  2.418  over  the  unit 
square.  With  the  maximum  cross-entropy  clutter  uniformly 
distributed  over  the  unit  square  a  tank  was  placed  in  the 
center  of  this  cluttered  scene.  The  size  of  the  tank  was 
then  steadily  decreased  to  produce  Figure  6.2  which  shows  the 
increase  in  scene  cross-entropy  as  the  relative  target  size 
is  decreased.  The  general  shape  will  adways  hold  true  with 
the  initial  cross -entropy  value  a  function  of  the  selected 
target  and  H(q,p)  asymptotically  approaching  the  maximum 
value  for  large  scene/target  ratios.  Note  that  the  operating 
point  used  for  the  test  set  of  images  is  shown  on  the  graph. 

Scene  cross-entropy  H(q,p)  is  one  of  two  components  of 
H(q,t)  as  shown  by  the  triangle  quality.  This  first  com¬ 
ponent  of  the  template  to  scene  distance  then  defines  a  dis¬ 
tance  the  minimum  cross-entropy  template  density  must  strive 
for  to  minimize  the  resulting  template  to  scene  distance 
H(q,t).  Two  factors  combine  to  determine  the  template  cross¬ 
entropy.  The  first  factor  is  camera  noise  and  it  tends  to 
reduce  H(t,p)  and  thus  increase  the  resulting  template  to 
scene  distance.  The  second  factor  is  the  amount  of  informa- 
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Fig.  6.2.  Scene  Cross-Entropy  versus  Scene/Target  Ratio 


tion  used  to  produce  the  template  or  the  number  of  informa¬ 
tion  functions.  The  template  cross-entropy  H(t,p)  moves 
toward  H(q,p)  as  more  information  functions  are  added  to 
the  minimum  cross-entropy  template.  The  two  concepts  of 
relative  target  size  and  the  number  of  Information  functions 
used  in  the  templates  cam  be  combined  into  one  multilevel 
plot  as  given  in  Figure  6.3.  The  straight  line  approximation 
for  H(t,p)  has  been  used  to  simplify  the  presentation.  The 
first  two  performance  factors,  scene/target  ratio  and  the 
number  of  information  functions  in  the  template,  have  been 
tied  together  with  the  template  to  scene  distance  H(q,t). 

The  next  step  is  to  estimate  the  relationship  between  H(q,t) 
and  the  probability  of  error. 


Approximate  Error  Probability 

The  first  step  in  estimating  ?(«)  for  the  minimum 
cross-entropy  detection  algorithm  is  to  evaluate  performance 
without  the  interfering  clutter  background.  Half  the  test 
scenes  will  contain  a  tsmget  and  the  other  have  only  a  uni¬ 
form  background.  The  target  in  the  first  half  of  the  test 
scenes  was  placed  at  various  locations  and  orientations  with¬ 
in  the  information  cell  to  insure  that  each  target  scene  is 
unique.  The  initizJ.  detection  rule  test  has  thus  reduced  to 
using  moments  as  an  object  descriptor  an  area  where  they  have 
been  applied  extensively.  Starting  with  early  work  in  char¬ 
acter  recognition  (Hu,  1963) ,  (Alt,  1962)  invariant  moments 
have  proven  useful  in  locating  known  objects  on  a  uniform 
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Fig.  6.5.  Minimum  Cross-Entropy  Tank  Reconstruction  (6  Information  Functions) 


Fig.  6.8.  Minimum  Cross-Entropy  Tank  Reconstruction  (21  Information  Functions) 


Minimum  Cross-Entropy  Tank  Reconstruction  (28  Information  Functions) 


.10.  Minimum  Cross-Entropy  Tank  Reconstruction  (36  Information  Functions) 


Fig.  6.11.  Minimum  Cross-Entropy  Tank  Reconstruction  (45  Information  Functions) 


Pig. 


Pig.  6.13.  Minimum  Cross-Entropy  Tank  Reconstruction  (66  Information  Functions) 


Pig.  6.14.  Minimum  Cross-Entropy  Tank  Reconstruction  (78  Information  Functions) 


background.  Automatic  interpretation  of  ship  photographs 
using  spatial  moments  (Smith,  1971)  has  obtained  performance 
comparable  to  htunan  photointt^..preters,  however,  again  on  a 
uniform  background. 

S.  B.  Dudani  extended  the  moment  invariant  concept  to 
the  identification  of  three-dimensional  objects  in  his  mas¬ 
ters  thesis  (Dudani,  1971)  and  in  his  doctoral  dissertation 
(Dudani,  1973)  where  he  conducted  an  experimental  study  of 
aircraft  identification  using  moment  methods.  Dudani  orient¬ 
ed  his  work  toward  video  imagery  and  only  used  the  informa¬ 
tion  contained  in  the  second  and  third  order  moments  calcu¬ 
lated  over  the  image  silhouette  and  boundary  to  provide  the 
information  for  his  target  classification  rule.  Then  he  used 
a  test  set  of  approximately  100  images  and  various  classifiers 
(Bayes,  K-nearest  neighbor  and  sequential)  to  show  that  the 
classifiers  performance  was  superior  to  that  obtained  with 
hiunan  test  subjects. 

With  this  historical  background  of  moments  used  as  an 
object  descriptor  it  is  not  surprising  that  the  minimum  cross¬ 
entropy  detection  algorithm  performs  well  without  an  inter¬ 
fering  clutter  background.  In  fact,  the  classifier  correctly 
recognized  every  scene  in  a  test  group  of  twenty  pictures. 

Half  of  these  pictures  contained  tauiks  on  a  uniform  background 
and  the  other  half  only  the  uniform  background.  The  minimum 
cross-entropy  rule  was  shown  to  function  correctly  as  an  ob¬ 
ject  descriptor  and  has  performed  at  least  as  well  as  earlier 


target  detection  rules  on  this  limited  set  of  test  data. 

Looking  again  at  the  history  of  moment  methods,  the  next 
logical  step  would  be  for  someone  to  find  objects  in  clut¬ 
tered  scenes  using  moments.  Wong  and  Hall  (Wong,  1978)  (Hall, 
1979)  have  tried  this  concept  by  using  scene  invariant  moments 
as  a  similarity  measure  in  matching  or  registration  of  radar 
and  optical  images.  Most  researchers  have,  however,  attempted 
to  isolate  a  candidate  pattern  from  its  background  by  prepro¬ 
cessing  the  picture  before  attempting  target  classification. 
This  approach  to  clutter  occured  as  stated  by  Nill  (Nill, 

1981)  since  it  was  assumed  that  otherwise  there  would  be  lit¬ 
tle  chance  of  recognizing  a  pattern  when  the  moments  consist 
of  contributions  from  the  pattern  and  background  clutter  com¬ 
bined.  The  preprocessing,  however,  produces  its  own  errors 
and  destroys  information  in  the  original  scene.  The  minimum 
cross-entropy  detection  rule  provides  an  alternative  to  the 
preprocessing  requirement  by  accounting  for  clutter  with  the 
templates  and  then  being  robust  to  clutter  perturbations  that 
occur  in  the  actual  scene. 

The  exact  relationship  between  error  probability  and 
template  to  scene  cross-entropy  can  not  be  established  ana¬ 
lytically  and  therefore  must  be  estimated  experimentally.  A 
set  of  test  scenes  and  a  set  of  templates  are  both  required 
for  this  experiment.  The  set  of  test  scenes  is  represented 
by  Figure  6.16  which  shows  one  of  the  fifty  tank  in  clutter 
pictures  and  Figure  6.17  which  shows  one  of  the  fifty  clutter 


pictur-3.  The  100  test  scenes  v.ere  all  unique  and  contained 
a  broad  spectrum  of  possible  clutter  configurations.  The 
selected  template  densities  are  provided  in  Appendix  A  with 
their  cross-entropy  values.  The  templates  use  circular  disks 
to  represent  the  clutter  and  provide  nine  target-clutter  al¬ 
ternatives  for  the  minimum  cross-entropy  detection  rule.  The 
clutter  model  is  a  simplified  version  of  one  used  in  an  Envi¬ 
ronmental  Research  Institute  of  Michigan  (ERIM)  Report  (Wil¬ 
kins,  1977)  to  provide  a  means  of  scene  modeling  arid  of  gen¬ 
erating  "typical”  scenes.  The  ERIM  modeling  procedure  uses 
elliptical  areas  to  represent  a  background  scene  and  produce 
a  pseudo-image  whose  spatial  characteristics  approximate  those 
of  the  original  image.  This  method  of  generating  typical 
scenes  is  attractive  and  as  discussed  by  Teague  (Teague,  1980) 
when  only  moments  up  through  second  order  are  considered,  all 
objects  are  completely  equivalent  to  a  constant  irradiance 
ellipse  having  definite  size,  orientation,  and  eccentricity 
and  centered  at  the  object  centroid.  Besides  making  intuitive 
sense  ERIM  has  experimentally  found  that  the  performance  of 
sensors  against  the  actual  background  and  against  the  simu¬ 
lated  background  is  essentially  the  same  and  thus  the  salient 
spatial  features  of  the  background  have  been  preserved  with 
the  pseudo-image. 

The  circular  disk  is  a  degenerate  ellipse  with  no  orien¬ 
tation  information  and  thus  this  clutter  model  is  very  simi¬ 
lar  to  the  pseudo -images  used  by  ERIM.  The  templates  with 


clutte^'  disks  in  Appendix  A  do  suffer  from  superimposed  cam¬ 
era  noise  which  detracts  from  their  effectiveness  in  the  tar¬ 
get  detection  algorithm.  Th  noise  impact  is  shown  in  Appen¬ 
dix  B  where  the  minimum  cross -entropy  densities  corresponding 
to  the  templates  of  Appendix  A  are  shown.  The  minimum  cross¬ 
entropy  densities  that  are  used  in  the  detection  rule  have 
low  prior  to  template  cross-entropy  values  because  of  the 
camera  noise.  The  smaller  template  H(t,p)  values  result 
in  larger  average  template  to  scene  distances  and  larger  prob 
ability  of  error  figures.  Despite  the  known  impact  of  cam¬ 
era  noise  no  method  could  be  found  that  would  remove  it  with¬ 
out  distorting  the  template. 

Using  the  100  test  scenes,  each  with  cross -entropies 
H,(q,p)  and  the  18  training  templates  each  with  minimum 
cross-entropies  H^(t,p)  a  test  run  of  the  detection  rule 
was  conducted.  In  the  test  procedure  P(tank)  =  P( clutter) 

=  i  and  the  algorithm  selects  a  template  from  the  training 
set  as  the  nearest  match  to  each  of  the  test  scenes.  Thus 
for  each  of  the  100  test  pictures  we  have  the  relationship 

H, (q.t)  =  H, (q,p)  -  H^(t,p) 

i  =  1  ...  100 

k  €  {1  . . .  18[ 

where  k  is  the  template  selected  by  the  detection  rule. 


The  larger  the  set  of  template  densities  the  more  likely  one 
of  the  predefined  template  clutter  configurations  will  cor¬ 
respond  closely  with  the  actuctl  test  scene  and  result  in  the 
correct  classification  of  that  scene.  Figure  6.18  displays 
the  performance  obtained  over  the  test  set  of  scenes  as  the 
number  of  template  alternatives  is  increased  for  two  to  eight¬ 
een.  The  performance  improvement  stops  in  this  test  of  the 
detection  algorithm  but  theoretically  performance  will  con¬ 
tinue  to  improve  as  more  and  more  templates  are  available  for 
comparison  with  each  test  scene.  The  departure  from  theory 
in  this  test  can  be  attributed  to  the  small  size  of  the  test 
scene  set.  Looking  at  Figure  5*2  relating  error  probability 
to  the  correlation  difference/clutter  standard  deviation 
ratio  shows  that  there  is  an  expected  slowing  in  performance 
improvement  as  templates  are  added  to  reduce  the  clutter 
standard  deviation.  The  test  set  error  probability  of  0.19 
corresponds  to  a  correlation/clutter  ration  of  =  0.9^ 

in  Figure  5' 2  and  a  region  of  rapidly  decreasing  slope  in  the 
graph.  Thus  it  is  expected  to  require  a  large  change  in  the 
number  of  templates  and  the  resulting  correlation/clutter 
ratio  to  produce  further  substantial  improvement  in  the  error 
probability  for  the  detection  rule.  The  improvement  of  P(£) 
with  increasing  numbers  of  templates  can  also  be  related  to 
the  improvement  in  the  average  template  to  scene  cross -entropy 
of  the  test  set  defined  as 


■  100 

H(q,t)  =  VH,(q,t)/l00 
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which  v'ill  decrease  as  better  template  scene  matches  result 
from  the  larger  set  of  template  alternatives.  The  template 
to  scene  cross-entropy  thus  has  a  well  defined  rela¬ 

tionship  to  all  three  performance  factors.  The  cross -entropy 
H(q,t)  can  also  be  related  to  the  error  probability  which 
is  normally  used  to  characterize  a  recognition  system's  per¬ 
formance.  Kovalevsky  (Kovalevsky,  1980:78)  explores  the  re¬ 
lationship  between  changes  in  entropy  and  probability  of 
error.  He  was  not  able  to  find  an  exact  functional  relation¬ 
ship  between  probability  of  error  and  entropy  but  has  estab¬ 
lished  a  definite  relationship  between  these  two  performance 
indicators.  The  results  show  that  for  a  given  entropy  change 
the  error  probability  can  vary  only  between  definite  limits 
and  conversely  for  a  given  error  probability  P  the  entropy 
lies  between  limits  that  are  a  function  of  P.  Since  using 
uniform  priors  in  the  cross-entropy  expressions  result  in  an 
equivalence  between  cross-entropy  and  entropy,  Kovalevsky's 
results  apply  to  this  work  also  since 

H(q,t)  =  H(q,p)  -  H(t,p) 

is  exactly  a  change  in  entropy. 

The  bounds  provided  by  Kovalevsky  are  interesting  but 
will  not  allow  the  selection  of  H(q,t)  based  on  a  system 
error  probability  requirement.  The  partitioning  of  the  set 
of  test  scene  cross-entropy  values  into  four  equal  size  bins 


with  each  having  a  resulting  P(€)  and  average  cross-entropy 
provides  an  approximation  to  the  relationship  between  cross - 
entropy  and  probability  of  error.  Figure  6.19  provides  a 
broken  line  plot  of  the  resulting  error  probability  plotted 
against  the  average  bin  cross-entropy  values.  The  straight 
line  plot  was  obtained  using  the  overall  test  set  error  prob¬ 
ability  as  a  pivot  point  and  then  providing  a  minimiun  differ¬ 
ence  compromise  between  the  more  error  prone  bin  cross-entropy 
values . 

Figure  6.19  clearly  shows  that  decreasing  the  template 
to  scene  cross-entropy  values  will  improve  the  expected  error 
probability  of  the  detection  algorithm.  Adding  template  al¬ 
ternatives  and  reducing  the  camera  noise  superimposed  on  the 
templates  will  therefore  result  in  improved  performance.  In¬ 
creasing  the  number  of  information  functions  and  decreasing 
cell  size  will  also  result  in  smaller  cross-entropy  H(q,t) 
values  and  thus  improved  performance.  Figures  6.19,  6.18, 
and  6.3  together  allow  a  system  error  probability  requirement 
to  define  the  required  performance  factor  values.  Conversely 
the  graphs  tie  together  the  performance  factors  determining 
error  probability  and  given  an  operating  point  provide  a  good 
estimate  of  expected  system  performance. 

The  minimum  cross-entropy  detection  rule  has  many  attrac¬ 
tive  attributes.  The  rule  has  been  shown  to  be  optimal  in  a 
well  defined  information  theoretic  sense.  Also  the  algorithm 
is  computationally  efficient  in  only  computing  moments  and 
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.19*  Probability  of  Error  versus  Cross-Entropy 


dot  products  on  line.  Finally  as  shovm  in  this  chapter  the 
detection  rule  is  robust  in  maintaining  performance  with  a 
range  of  underlying  clutter  density  configurations. 


Chapter  VII .  Summary  and  Future  Research 


Summary 

Based  on  the  desired  properties  of  any  inference  proce¬ 
dure  stated  as  four  consistency  axioms,  this  dissertation  has 
used  the  concept  of  minimum  cross-entropy  to  develop  a  target 
in  clutter  detection  algorithm.  The  algorithm  uses  minimum 
cross-entropy  templates  that  are  constructed  using  all  avail¬ 
able  moment  information,  but  maintaining  "maximum  uncertainty" 
with  respect  to  unspecified  information.  This  construction 
technique  provides  a  "minimally  prejudiced"  template  and  re¬ 
sults  in  a  detection  rule  that  is  robust  to  clutter  perturba¬ 
tions  in  the  actual  scene.  The  development  requires  informa¬ 
tion  in  the  form  of  two-dimensional  moments  that  are  convert¬ 
ed  into  expected  values  of  an  orthonormal  set  of  information 
functions  constructed  with  Legendre  polynomials.  The  work  is 
based  on  a  constrained  optimization  problem  and  includes 
three  procedural  steps:  specification  of  the  set  of  template 
densities,  solution  of  the  constraint  equations  to  completely 
define  the  minimum  cross-entropy  template  and  the  use  of 
cross -entropy  to  match  actual  scenes  with  the  predefined  tem¬ 
plates  . 


The  properties  of  cross-entropy  minimization  were  re¬ 
viewed  showing  the  existance  of  a  unique  solution  to  the  con¬ 
strained  optimization  problem.  Further,  the  solution  density 
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t  : 


was  shown  to  take  the  following  form: 

t(x,y)  =  p(x,y)exp  I -Xo  -X,f,  (x,y)  -  ...  -X^f^(x,y)| 

where  the  f, (x,y)  are  information  functions,  the  X,  are 
the  associated  Lagrange  multipliers,  and  p(x,y)  is  the  as- 
s^lmed  prior  density.  A  numerical  scheme  based  on  the  Cyclic 
Coordinate  Method  was  presented  to  solve  the  constraint  equa¬ 
tions  recast  as  a  variational  problem  for  a  potential  func¬ 
tional.  The  potential  functional  is  concave  for  any  trial 
set  of  Lagrange  parameters  and  will  thus  always  converge  to 
a  global  solution. 

The  general  target  classification  algorithm  is  developed 
based  on  the  ability  of  cross-entropy  to  measure  how  much  a 
scene  density  differs  from  a  predefined  template  density. 
Using  the  triangle  equality  and  the  posterior  adaptation  pro¬ 
perty  of  minimum  cross-entropy  densities  results  in  a  fast 
on-line  numerical  implementation  of  the  classification  rule. 
The  on-line  processing  was  reduced  to  a  dot  product  operation 
between  the  scene  moment  vector  and  all  stored  template  lamb¬ 
da  vectors  followed  by  a  search  for  the  minimum  dot  product 
value.  Conceptually,  the  minimum  cross-entropy  classifier 
looks  for  the  template  lambda  vector  most  nearly  orthogonal 
to  the  scene  moment  vector  in  the  decision  space. 

Detection  rule  performance  was  examined  resulting  in  the 
identification  of  relative  target  size,  number  of  information 
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functions  used  in  the  template  and  number  of  template  alter¬ 
natives  as  the  major  performance  determining  factors.  All 
three  performance  factors  were  related  to  a  template  to  scene 
distance  H(q,t)  to  show  how  these  factors  are  interrelated 
in  determining  performance.  Finally,  a  set  of  100  test 
scenes  was  processed  using  the  decision  rule  to  estimate  the 
relationship  between  the  template  to  scene  cross-entropy 
H(q,t)  and  the  probability  of  error.  The  entire  target 
detection  procedure  has  been  programmed  and  tested  for  com¬ 
puter  use. 

Future  Research 

The  research  conducted  in  developing  the  minimum  cross - 
entropy  detection  algorithm  surfaced  several  areas  for  con¬ 
tinued  investigation.  These  research  areas  are  outlined  in 
the  following  paragraphs. 

The  area  of  template  selection  offers  large  dividends 
in  improved  detection  algorithm  performance.  A  method  of 
producing  templates  without  camera  noise  will  result  in  an 
immediate  performance  improvement.  The  major  research  area 
is, however,  an  optimum  method  of  clutter  placement  within  the 
template  coupled  with  an  analysis  of  the  optimum  number  of 
templates  for  a  given  number  of  information  functions  and 
scene/target  ratio.  The  performance  impact  of  using  ellipses 
rather  than  circles  in  the  clutter  model  should  also  be  ad¬ 
dressed. 

Using  integral  transforms  (Wolf,  1979),  the  detection 


problen  can  be  moved  to  a  transform  domain  which  could  im¬ 
prove  error  probability  performance  by  removing  some  of  the 
initial  target  orientation  uncertainty.  As  examples,  the 
magnitude  of  the  Fourier  transform  of  an  object  or  function 
is  invariant  to  a  shift  in  the  function  and  the  Mellin  trans¬ 
form  is  invariant  to  scale  change  in  the  input  function. 
Casasent  discusses  these  transforms  and  combinations  such  as 
Fourier-Mellin  transforms  coupled  with  geometrical  transfor¬ 
mations  (Casasent,  1979)  that  provide  positional,  rotationail, 
and  scale  invariance.  In  a  transform  domain  that  reduces 
initial  target  orientation  uncertainty,  the  research  would 
explore  the  optimum  selection  of  information  functions.  The 
information  function  set  selected  would  depend  on  the  target 
of  interest  and  could  achieve  improved  probability  of  error 
performance  with  a  simplified  template  model  and  reduced  pre¬ 
processing  workload. 

The  target  detection  algorithm  coupled  with  preprocess¬ 
ing  algorithms  such  as  edge  detectors  (Abdou,  1978)  is  anoth¬ 
er  area  for  further  research.  The  information  cell  moments 
would  be  calculated  over  an  image  silhouette  and  boundary 
after  the  preprocessing  operations.  Using  the  preprocessing 
approach  would  result  in  a  method  similar  to  Dudani’s  (Dudani, 
1973)  that  uses  the  preprocessing  to  remove  unimportant  in¬ 
formation  from  the  image  and  increase  the  template  cross - 
entropy  and  thus  improve  error  probability  performance. 

The  research  centered  on  characterizing  two-dimensional 


density  functions  to  develop  the  detection  rule,  yet  the 
theoretical  development  supports  a  three-dimensional  charac¬ 
terization.  Using  a  stereo  vision  system  or  a  scanning  laser 
rangefinder,  a  range  image  is  obtained  where  gray  level  re¬ 
presents  not  brightness,  but  the  distance  from  the  camera  to 
the  reflecting  surface  in  the  scene  (Castleman,  1979: 3^9)- 
The  combination  of  a  brightness  image  and  range  image  pro¬ 
duces  an  approximate  three-dimensional  image  density  function 
(x,y,z).  The  three-dimensional  moments  of  order  p  +  q  +  s 
of  the  density  (x,y,z)  are  then  defined  by 


M 


pqs 


=  yy* (x,y,  z)dxdydz 


cub* 


With  the  z  axis  perpendicular  to  the  principal  axis  of  the 
pattern,  we  have  an  immediate  invariant  coordinate  system  for 
the  minimum  cross -entropy  density  approximation.  After  de¬ 
fining  a  set  of  three-dimensional  information  functions  all 
concepts  carry  forward  from  the  two-dimensional  case.  In 
three-dimensions  the  solution  to  the  constrained  optimization 
problem  takes  the  form 


t(x,y,z)  =  p(x,y,z)exp  I -X^- 5Z\f^(x,y,z)| 

k-1 


Where  p(x,y,z)  is  the  prior  density.  The  template  clutter 
model  extends  the  two-dimensional  clutter  ellipses  to  three- 
dimensional  clutter  ellipsoids  and  the  detection  rule  re¬ 
mains  unchanged.  Research  into  a  three-dimensional  target 


detection  rule  could  provide  a  method  of  detecting  objects 
in  oblique  aerial  scenes. 

Finally,  this  detection  rule  provides  a  means  of  detect 
ing  objects  in  cluttered  scenes  and  we  have  suggested  exten¬ 
sions  that  may  improve  the  probability  of  error  performance. 
Potential  applications  in  reconnaissance,  industrial  robots, 
and  imaging  radar  are  examples  that  make  extension  of  this 
work  a  viable  research  area. 
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Appendix  A.  Template  Densities 

The  eighteen  template  scenes  used  in  Chapter  Six  to  eval 
uate  the  approximate  probability  of  error  are  shovm  in  this 
appendix.  Moments  were  taken  from  these  template  scenes  to 
define  a  set  of  nonlinear  constraint  equations  for  each  den¬ 
sity.  The  constraint  equations  were  solved  for  the  lambda 
vector  using  the  cyclic  coordinate  search  method.  The  result 
ing  lambda  vector  then  completely  defines  a  corresponding 
minimum  cross-entropy  template  that  is  provided  in  Appendix  B 
These  templates  were  produced  from  photographs  and  are 
stored  in  the  computer  as  256  x  256  integer  arrays.  The 
gray  scale  is  confined  to  sixteen  levels  in  the  sampling  pro¬ 
cess  which  accounts  for  the  abrupt  changes  in  intensity  seen 
in  the  templates.  Note  the  camera  noise  superimposed  on 
these  templates  and  their  resulting  low  cross-entropy  values. 

The  odd  number  templates  represent  clutter  scenes  while 
the  even  number  templates  represent  a  tank  with  various  clut¬ 
ter  backgrounds.  Each  template  pair  (ex.  1  and  2)  represents 
a  tank-clutter  alternative  for  the  detection  rule.  The  clut¬ 


ter  seen  in  these  templates  comes  from  dark  circular  disks 
distributed  in  the  scene  that  represent  various  possible 
clutter  configurations. 


Fig.  A. 2.  Template  Density  2  with  Cross -Entropy  =  O.86036 


Fig.  A. 3.  Template  Density  3  with  Cross-Entropy  =  I.29606 


Fig.  A. 4.  Template  Density  4  with  Cross-Entropy  =  0.9340? 


Fig.  A. 6.  Template  Density  6  with  Cross-Entropy  =  0.90418 


Fig.  A. 8.  Template  Density  8  with  Cross-Entropy  =  0. 96869 


Fig.  A. 9.  Template  Density  9  with  Cross-Entropy  =  0.86579 


Fig.  A. 10.  Template  Density  10  with  Cross-Entropy  =  0.91397 


Fig.  A. 11.  Template  Density  11  with  Cross-Entropy  =  0.85^47 


Fig.  A. 12.  Template  Density  12  with  Cross-Entropy  =  0. 92389 


Fig.  A.  13.  Template  Density  13  v^ith  Cross-Entropy  =  0.88835 


Fig.  A.14.  Template  Density  l4  with  Cross-Entropy  =  0.91680 


Fig.  A. 15.  Template  Density  I5  with  Cross-Entropy  =  0. 88692 


Fig.  A.l6.  Template  Density  16  with  Cross-Entropy  =  0.91118 


Fig.  A. 17.  Template  Density  I7  with  Cross-Entropy  =  O.91693 


Fig.  A.  18.  Template  Densx..y  18  with  Cross-Entropy  =  0.88837 


ADDendix  B.  Minimum  Cross-Entropy  Templates 


Fig.  B.l.  Minimum  Cross -Entropy  Template  1  with  Cross-Entropy  =  0.34217 


Fig.  B.2.  Minimum  Cross-Entropy  Template  2  with  Cross-Entropy  =  0.40756 


Minimum  Cross-Entropy  Template  3  with  Cross-Entropy  =  0.44235 


Fig.  B.4.  Minimum  Cross-Entropy  Template  4  with  Cross-Entropy  =  0.44962 


Minimum  Cross-Entropy  Template  5  with  Cross-Entropy  =  0.41877 


Fig.  B.6.  Minimum  Cross-Entropy  Template  6  with  Cross-Entropy  =  0.44274 


Fig.  B.7.  Minimum  Cross-Entropy  Template  7  with  Cross-Entropy  =  0.41746 


Fig.  B.8.  Minimum  Cross -Entropy  Template  8  with  Cross -Entropy  =  0.46145 


Lnlmum  Cross-F'Jntropy  Template  9  with  Cross-Entropy  =  0.42566 


Fig.  B.IO.  Minimum  Cross-Entropy  Tempi nte  10  with  Cross-Entropy  =  0.46862 


11  with  Cross-Entropy  =  O.4396I 


Fig.  B,12.  Minimum  Cross-Entropy  Template  12  with  Cross-Entropy  =  O.51369 


Fig.  B.I3.  Minimum  Cross-Entropy  Template  I3  with  Cross-Entropy  =  0. 39674 


Fig.  B.14,  Minimum  Cross-Entropy  Template  l4  with  Cross-Entropy  =  0.42777 


Fig.  B.l6.  Minimum  Cross-Entropy  Template  16  with  Cross-Entropy  =  0.41373 


Fig.  B.I7.  Minimum  Cross-Entropy  Template  I7  with  Cross-Entropy  =  0.44381 


Fig.  B.18.  Minimum  Cross-Entropy  Template  18  with  Cross-Entropy  =  0.41452 


ppendix  C.  Te^-olate  Photographs 

een  original  template  photographs  are  provided 

ix.  The  photographs  are  sampled  to  produce  the 


gray  scale  perspective  plots  provided  in  Appen- 
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Template  1 


Template  2 


Fig.  C.l.  Template  Alternative  One 
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Template  3 


O 


Template  4 


Template  Alternative  Two 
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Template  8 


Fig.  C.fy  Template  Alternative  Four 
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Fig.  C.5-  Template  Alternative  Five 
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Template  13 
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Template  14 


Template  16 


Template  Alternative  Eight 

174 


o  o  o 


Template  17 


Template  18 


Template  Alternative  Nine 
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